From: Oleg Nesterov <oleg@redhat.com>
To: Andrew Morton <akpm@linux-foundation.org>,
"Nikita V. Youshchenko" <nyoushchenko@mvista.com>,
Alexander Kaliadin <akaliadin@mvista.com>,
oishi.y@sys.yzk.co.jp
Cc: linux-kernel@vger.kernel.org
Subject: Re: Likely race between sys_rt_sigtimedwait() and complete_signal()
Date: Sat, 9 Apr 2011 15:45:20 +0200 [thread overview]
Message-ID: <20110409134520.GA19651@redhat.com> (raw)
In-Reply-To: <20110407141215.46d0b930.akpm@linux-foundation.org>
Can't find the original email, replying to Andrew's fwd.
On 04/07, Andrew Morton wrote:
>
> Within project we are working on, we are facing a "rare" situation when
> setitimer() / sigwait() - based periodic task execution hangs. "Rare"
> means once per several hours for 1000 Hz timer.
>
> For hanged thread, cat /proc/pid/status shows
>
> ...
> State: S (sleeping)
> ...
> SigPnd: 0000000000000000
> ShdPnd: 0000000000002000
> SigBlk: 0000000000000000
> ...
>
> and SysRq - T shows
>
> [<c015b1b0>] (__schedule+0x2fc/0x37c) from [<c015b7b8>]
> (schedule+0x1c/0x30)
> [<c015b7b8>] (schedule+0x1c/0x30) from [<c015b8c4>]
> (schedule_timeout+0x18/0x1dc)
> [<c015b8c4>] (schedule_timeout+0x18/0x1dc) from [<c004a084>]
> (sys_rt_sigtimedwait+0x1b4/0x288)
> [<c004a084>] (sys_rt_sigtimedwait+0x1b4/0x288) from [<c001cf00>]
> (ret_fast_syscall+0x0/0x28)
Is this thread the group leader?
> All other threads have SIGALRM blocked as they should, looking
> through /proc/X/status proves this.
Do they ever had SIGALRM unlblocked ?
> So for some reason, SIGALRM was successfully delivered by timer, bit was
> set in ShdPnd [I guess at the bottom of __send_signal()], but that still
> resulted somehow in thread going to schedule() and not waking.
Thanks for the detailed report.
There is an old, ancient problem which I constantly forget to fix.
It _can_ perfectly explain the hang, at least in theory. I'll try
to make the patch on Monday.
In short: if a thread T runs with SIGALRM unblocked while another
thread sleeps in sigtimedwait(), and then T blocks SIGALRM, the
signal can be "lost" as above.
Does your application do something like this? If not, then there
is another problem.
> This is on embedded system running vendor 2.6.31-based kernel, moving
> forward is unfortunately impossible because of hardware support issues.
If I make the patch for 2.6.31, any chance you can test it?
> However I guess the race we faced still exists in the current upstream
> kernel,
Yes, this is possible. OTOH, the bug can be anywhere, not necessarily in
signal.c, and it might be already fixed.
Oleg.
next parent reply other threads:[~2011-04-09 13:45 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20110407141215.46d0b930.akpm@linux-foundation.org>
2011-04-09 13:45 ` Oleg Nesterov [this message]
2011-04-09 19:44 ` Likely race between sys_rt_sigtimedwait() and complete_signal() Nikita V. Youshchenko
2011-02-10 7:32 Nikita V. Youshchenko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110409134520.GA19651@redhat.com \
--to=oleg@redhat.com \
--cc=akaliadin@mvista.com \
--cc=akpm@linux-foundation.org \
--cc=linux-kernel@vger.kernel.org \
--cc=nyoushchenko@mvista.com \
--cc=oishi.y@sys.yzk.co.jp \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.