From: Darren Hart <dvhltc@us.ibm.com>
To: Blaise Gassend <blaise@willowgarage.com>
Cc: Jeremy Leibs <leibs@willowgarage.com>,
Thomas Gleixner <tglx@linutronix.de>,
Peter Zijlstra <peterz@infradead.org>,
"lkml, " <linux-kernel@vger.kernel.org>
Subject: Re: ERESTARTSYS escaping from sem_wait with RTLinux patch
Date: Mon, 12 Oct 2009 21:54:36 -0700 [thread overview]
Message-ID: <4AD4080C.20703@us.ibm.com> (raw)
In-Reply-To: <4AD3FFB0.5030405@us.ibm.com>
Darren Hart wrote:
> Resending, hopefully with fixed whitespace mangling in the trace this
> time...
>
> Darren Hart wrote:
>> Darren Hart wrote:
>>> Blaise Gassend wrote:
>>>> A few more questions you may have answers to:
>>>>
>>>> Do you have any idea what this comment in futex.c could be referring
>>>> to?
>>>>
>>>> /* * We expect signal_pending(current), but another thread may *
>>>> have handled it for us already. */
>>>> As far as I have been able to understand, signals are thread-specific,
>>>> and hence it doesn't make sense to me that another thread could have
>>>> handled it.
>>>
>>> Signals are only thread specific when using something like
>>> pthread_kill() to send the signal, otherwise they are process wide.
>>>
>>>>
>>>>> OK, so I suspect one of two things.
>>>>>
>>>>> 1) Recent changes to futex.c have somehow created a wakeup race and
>>>>> unqueue_me() doesn't detect it was woken with FUTEX_WAKE, then
>>>>> falls
>>>>> out through the ERESTARTSYS path.
>>>>>
>>>>> 2) Recent changes have exposed an existing race in unqueue_me().
>>>>
>>>> Is it possible that there aren't many people using PREEMPT RT on 8 CPU
>>>> machines, and hence this is a bug that just has't been observed yet?
>>>
>>> We actually do extensive testing on 8way systems with some large apps
>>> that beat on futexes pretty badly. You've simply uncovered a nasty
>>> little race in the wakeup path.
>>>
>>> I believe I have identified the patch where this became possible (I
>>> don't say the cause of the bug, because it's possible this patch simply
>>> exposed an existing race):
>>>
>>> 928686b77ab275fd7f828ff24bd510baca995425 futex: Wake up waiter outside
>>> the hb->lock section
>>>
>>> I am currently instrumenting the futex code and trying to identify how
>>> the race occurs.
>
> ...
>
>>> Full output here:
>
> ...
>
>> http://dvhart.com/darren/files/futex_wake_function.trace.gz
>>
>> It's a tad difficult to navigate, but I believe I am seeing
>> wake_futex_list() try and wake python-3490 without previously adding
>> it to the wake-list. If we are somehow not cleaning up our wake_list,
>> this would explain why unqueue_me() sees the q->lock_ptr as non-null -
>> wake_futex() wasn't called to clear it.
>
> OK, I believe I can confirm this with this subset of the trace. It follows
> three futex_wait and wake-up cycles. The third wake-up occurs without the
> python-3490 thread ever having been added to the wake_list (at least, there
> is not record of it in the trace). Now to see why this might be the case...
>
> python-3490 [002] 259.041420: futex_wait <-do_futex
> python-3490 [002] 259.041420: futex_wait_setup <-futex_wait
> python-3490 [002] 259.043888: futex_wait_queue_me <-futex_wait
> python-3490 [002] 259.043888: queue_me <-futex_wait_queue_me
> python-3490 [002] 259.043920: schedule <-futex_wait_queue_me
> python-3507 [004] 259.043929: wake_futex: adding python-3490 to
> wake_list
> python-3507 [004] 259.043957: wake_futex_list: wake_futex_list:
> waking python-3490
> python-3490 [002] 259.043981: futex_wait: normal futex wake-up
> detected for python-3490
>
> python-3490 [002] 259.043987: futex_wait <-do_futex
> python-3490 [002] 259.043987: futex_wait_setup <-futex_wait
> python-3490 [002] 259.044323: futex_wait_queue_me <-futex_wait
> python-3490 [002] 259.044323: queue_me <-futex_wait_queue_me
> python-3495 [002] 259.044571: wake_futex: adding python-3490 to
> wake_list
Interesting, here we never see a wake_futex_list: waking python-3490.
So the task wakes here and thinks it is a normal wakeup, when perhaps it is
not. If a timeout or a signal were to occur here, we would not detect them
as unqueue_me() would see the lock_ptr had been nulled by wake_futex(). The
task returns to userspace ignoring the timeout or signal.
> python-3490 [002] 259.044843: futex_wait: normal futex wake-up
> detected for python-3490
>
> python-3490 [002] 259.044848: futex_wait <-do_futex
The app then puts it back to sleep here.
> python-3490 [002] 259.044848: futex_wait_setup <-futex_wait
> python-3490 [002] 259.046648: futex_wait_queue_me <-futex_wait
> python-3490 [002] 259.046648: queue_me <-futex_wait_queue_me
> python-3490 [002] 259.046664: schedule <-futex_wait_queue_me
> ********* python-3490 was never added to the wake_list !!!!!!!
> *********
>
> python-3495 [002] 259.046680: wake_futex_list: wake_futex_list:
> waking python-3490
When 3495 finally get's to run and complete it's futex_wake() call, the task
still needs to be woken, so we wake it - but now it's enqueued with a different
futex_q, which now has a valid lock_ptr, so upon wake-up we expect a signal!
OK, I believe this establishes root cause. Now to come up with a fix...
> python-3490 [002] 259.046816: futex_wait: returning 1, non-futex
> wakeup for python-3490
> python-3490 [002] 259.046817: futex_wait: p->futex_wakeup: (null)
> python-3490 [002] 259.046819: futex_wait: error in wake-up
> detection, no signal pending for python-3490
Thanks,
--
Darren Hart
IBM Linux Technology Center
Real-Time Linux Team
next prev parent reply other threads:[~2009-10-13 4:55 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-10-10 9:09 ERESTARTSYS escaping from sem_wait with RTLinux patch Blaise Gassend
2009-10-10 16:40 ` ERESTARTSYS escaping from sem_wait with Preempt-RT Blaise Gassend
2009-10-10 17:59 ` ERESTARTSYS escaping from sem_wait with RTLinux patch Thomas Gleixner
2009-10-10 19:08 ` Jeremy Leibs
2009-10-11 2:07 ` Jeremy Leibs
2009-10-11 5:48 ` Jeremy Leibs
2009-10-12 14:16 ` Darren Hart
[not found] ` <1255384010.10236.123.camel@lts.willowgarage.com>
[not found] ` <4AD3BD57.6080703@us.ibm.com>
[not found] ` <4AD3D6AE.2050609@us.ibm.com>
[not found] ` <4AD3FFB0.5030405@us.ibm.com>
2009-10-13 4:54 ` Darren Hart [this message]
2009-10-13 8:56 ` Blaise Gassend
2009-10-13 15:13 ` Darren Hart
2009-10-13 18:45 ` Thomas Gleixner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4AD4080C.20703@us.ibm.com \
--to=dvhltc@us.ibm.com \
--cc=blaise@willowgarage.com \
--cc=leibs@willowgarage.com \
--cc=linux-kernel@vger.kernel.org \
--cc=peterz@infradead.org \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.