From: Oleg Nesterov <oleg@redhat.com>
To: Andrew Morton <akpm@linux-foundation.org>,
"Nikita V. Youshchenko" <nyoushchenko@mvista.com>,
Alexander Kaliadin <akaliadin@mvista.com>,
oishi.y@sys.yzk.co.jp
Cc: linux-kernel@vger.kernel.org
Subject: Re: Likely race between sys_rt_sigtimedwait() and complete_signal()
Date: Sat, 9 Apr 2011 15:45:20 +0200 [thread overview]
Message-ID: <20110409134520.GA19651@redhat.com> (raw)
In-Reply-To: <20110407141215.46d0b930.akpm@linux-foundation.org>
Can't find the original email, replying to Andrew's fwd.
On 04/07, Andrew Morton wrote:
>
> Within project we are working on, we are facing a "rare" situation when
> setitimer() / sigwait() - based periodic task execution hangs. "Rare"
> means once per several hours for 1000 Hz timer.
>
> For hanged thread, cat /proc/pid/status shows
>
> ...
> State: S (sleeping)
> ...
> SigPnd: 0000000000000000
> ShdPnd: 0000000000002000
> SigBlk: 0000000000000000
> ...
>
> and SysRq - T shows
>
> [<c015b1b0>] (__schedule+0x2fc/0x37c) from [<c015b7b8>]
> (schedule+0x1c/0x30)
> [<c015b7b8>] (schedule+0x1c/0x30) from [<c015b8c4>]
> (schedule_timeout+0x18/0x1dc)
> [<c015b8c4>] (schedule_timeout+0x18/0x1dc) from [<c004a084>]
> (sys_rt_sigtimedwait+0x1b4/0x288)
> [<c004a084>] (sys_rt_sigtimedwait+0x1b4/0x288) from [<c001cf00>]
> (ret_fast_syscall+0x0/0x28)
Is this thread the group leader?
> All other threads have SIGALRM blocked as they should, looking
> through /proc/X/status proves this.
Do they ever had SIGALRM unlblocked ?
> So for some reason, SIGALRM was successfully delivered by timer, bit was
> set in ShdPnd [I guess at the bottom of __send_signal()], but that still
> resulted somehow in thread going to schedule() and not waking.
Thanks for the detailed report.
There is an old, ancient problem which I constantly forget to fix.
It _can_ perfectly explain the hang, at least in theory. I'll try
to make the patch on Monday.
In short: if a thread T runs with SIGALRM unblocked while another
thread sleeps in sigtimedwait(), and then T blocks SIGALRM, the
signal can be "lost" as above.
Does your application do something like this? If not, then there
is another problem.
> This is on embedded system running vendor 2.6.31-based kernel, moving
> forward is unfortunately impossible because of hardware support issues.
If I make the patch for 2.6.31, any chance you can test it?
> However I guess the race we faced still exists in the current upstream
> kernel,
Yes, this is possible. OTOH, the bug can be anywhere, not necessarily in
signal.c, and it might be already fixed.
Oleg.
next parent reply other threads:[~2011-04-09 13:45 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20110407141215.46d0b930.akpm@linux-foundation.org>
2011-04-09 13:45 ` Oleg Nesterov [this message]
2011-04-09 19:44 ` Likely race between sys_rt_sigtimedwait() and complete_signal() Nikita V. Youshchenko
2011-02-10 7:32 Nikita V. Youshchenko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110409134520.GA19651@redhat.com \
--to=oleg@redhat.com \
--cc=akaliadin@mvista.com \
--cc=akpm@linux-foundation.org \
--cc=linux-kernel@vger.kernel.org \
--cc=nyoushchenko@mvista.com \
--cc=oishi.y@sys.yzk.co.jp \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox