public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Likely race between sys_rt_sigtimedwait() and complete_signal()
@ 2011-02-10  7:32 Nikita V. Youshchenko
  0 siblings, 0 replies; 3+ messages in thread
From: Nikita V. Youshchenko @ 2011-02-10  7:32 UTC (permalink / raw)
  To: linux-kernel; +Cc: Alexander Kaliadin, oishi.y

Hello linux-kernel.

Within project we are working on, we are facing a "rare" situation when 
setitimer() / sigwait() - based periodic task execution hangs. "Rare" 
means once per several hours for 1000 Hz timer.

For hanged thread, cat /proc/pid/status shows

...
State:	S (sleeping)
...
SigPnd:	0000000000000000
ShdPnd:	0000000000002000
SigBlk:	0000000000000000
...

and SysRq - T shows

[<c015b1b0>] (__schedule+0x2fc/0x37c) from [<c015b7b8>] 
(schedule+0x1c/0x30)
[<c015b7b8>] (schedule+0x1c/0x30) from [<c015b8c4>] 
(schedule_timeout+0x18/0x1dc)
[<c015b8c4>] (schedule_timeout+0x18/0x1dc) from [<c004a084>] 
(sys_rt_sigtimedwait+0x1b4/0x288)
[<c004a084>] (sys_rt_sigtimedwait+0x1b4/0x288) from [<c001cf00>] 
(ret_fast_syscall+0x0/0x28)

All other threads have SIGALRM blocked as they should, looking 
through /proc/X/status proves this.

So for some reason, SIGALRM was successfully delivered by timer, bit was 
set in ShdPnd [I guess at the bottom of __send_signal()], but that still 
resulted somehow in thread going to schedule() and not waking.

I guess this is some sort of race between sys_rt_sigtimedwait() and 
complete_signal().

This is on embedded system running vendor 2.6.31-based kernel, moving 
forward is unfortunately impossible because of hardware support issues. 
However I've looked through

git log -p HEAD..linus/master -- kernel/signal.c

and did not notice anything that could be related.

Unfortunately we don't currently have resources for futher analysis - 
especially with simple workarounds existing, such as switch to 
timerfd-based periodic execution (which looks working without hangs).

However I guess the race we faced still exists in the current upstream 
kernel, so maybe somebody on this mailing list could be interested into 
looking at this?

Nikita

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2011-04-09 19:50 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20110407141215.46d0b930.akpm@linux-foundation.org>
2011-04-09 13:45 ` Likely race between sys_rt_sigtimedwait() and complete_signal() Oleg Nesterov
2011-04-09 19:44   ` Nikita V. Youshchenko
2011-02-10  7:32 Nikita V. Youshchenko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox