* Debugging kernel semaphore contention and priority inversion
@ 2005-08-17 22:52 Davda, Bhavesh P (Bhavesh)
2005-08-17 23:33 ` Keith Mannthey
0 siblings, 1 reply; 9+ messages in thread
From: Davda, Bhavesh P (Bhavesh) @ 2005-08-17 22:52 UTC (permalink / raw)
To: linux-kernel
Is there a way to know which task has a particular (struct semaphore *)
down()ed, leading to another task's down() blocking on it?
I'm trying to debug a priority inversion caused by potentially a real
low priority SCHED_OTHER task (potentially a kernel thread like
kjournald) holding an inode->i_sem semaphore for a file that is blocking
a write() from a high-priority (50) SCHED_FIFO task.
It would be helpful to get a kernel stacktrace for the culprit too.
Thanks
- Bhavesh
Ps: ideally I would like to do this from a module/probe I can insert in
a system that is stuck in this state, because I don't want to Heisenberg
the setup with a kdb or otherwise instrumented kernel.
Bhavesh P. Davda | Distinguished Member of Technical Staff | Avaya |
1300 West 120th Avenue | B3-B03 | Westminster, CO 80234 | U.S.A. |
Voice/Fax: 303.538.4438 | bhavesh@avaya.com
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Debugging kernel semaphore contention and priority inversion
2005-08-17 22:52 Davda, Bhavesh P (Bhavesh)
@ 2005-08-17 23:33 ` Keith Mannthey
0 siblings, 0 replies; 9+ messages in thread
From: Keith Mannthey @ 2005-08-17 23:33 UTC (permalink / raw)
To: Davda, Bhavesh P (Bhavesh); +Cc: linux-kernel
On 8/17/05, Davda, Bhavesh P (Bhavesh) <bhavesh@avaya.com> wrote:
> Is there a way to know which task has a particular (struct semaphore *)
> down()ed, leading to another task's down() blocking on it?
I would add a field to struct semaphore that tracks the current process.
In your various up and downs have that field tracks the "current" process.
Do you know what semaphore it is?
This way you dump the semaphore you can see what task it is holding
it. Have the module dump the semaphore and you can id the task
> It would be helpful to get a kernel stacktrace for the culprit too.
Have you tried sysrq t? See the Documentation/sysrq.txt file.
How stuck is the system?
Keith
^ permalink raw reply [flat|nested] 9+ messages in thread
* RE: Debugging kernel semaphore contention and priority inversion
@ 2005-08-18 3:43 Davda, Bhavesh P (Bhavesh)
2005-08-18 17:39 ` Keith Mannthey
0 siblings, 1 reply; 9+ messages in thread
From: Davda, Bhavesh P (Bhavesh) @ 2005-08-18 3:43 UTC (permalink / raw)
To: Keith Mannthey; +Cc: linux-kernel
> From: Keith Mannthey [mailto:kmannth@gmail.com]
> Sent: Wednesday, August 17, 2005 5:33 PM
>
> On 8/17/05, Davda, Bhavesh P (Bhavesh) <bhavesh@avaya.com> wrote:
> > Is there a way to know which task has a particular (struct
> semaphore
> > *) down()ed, leading to another task's down() blocking on it?
>
> I would add a field to struct semaphore that tracks the
> current process.
> In your various up and downs have that field tracks the
> "current" process.
Yeah, I thought about that. Unfortunately, it doesn't meet my need for
not Heisenberg'ing the system. I can't instrument the struct semaphore
{} in a running system.
>
> Do you know what semaphore it is?
Yes. It is an inode->i_sem semaphore for a file being written to by the
high-priority SCHED_FIFO task.
>
> This way you dump the semaphore you can see what task it is
> holding it. Have the module dump the semaphore and you can
> id the task
>
> > It would be helpful to get a kernel stacktrace for the culprit too.
>
> Have you tried sysrq t? See the Documentation/sysrq.txt file.
This is a headless system.
>
> How stuck is the system?
>
> Keith
Very. Only pingable, but can't login via telnet/ssh/anything. Reason is
the same reason the low priority mystery task is unable to run and
release the held semaphore.
Thanks
- Bhavesh
Bhavesh P. Davda | Distinguished Member of Technical Staff | Avaya |
1300 West 120th Avenue | B3-B03 | Westminster, CO 80234 | U.S.A. |
Voice/Fax: 303.538.4438 | bhavesh@avaya.com
^ permalink raw reply [flat|nested] 9+ messages in thread
* RE: Debugging kernel semaphore contention and priority inversion
@ 2005-08-18 14:50 Davda, Bhavesh P (Bhavesh)
2005-08-18 15:13 ` Hal Wigoda
2005-08-18 15:15 ` Steven Rostedt
0 siblings, 2 replies; 9+ messages in thread
From: Davda, Bhavesh P (Bhavesh) @ 2005-08-18 14:50 UTC (permalink / raw)
To: Mike Galbraith; +Cc: linux-kernel
> -----Original Message-----
> From: Mike Galbraith [mailto:efault@gmx.de]
> Sent: Wednesday, August 17, 2005 11:10 PM
> At 09:43 PM 8/17/2005 -0600, you wrote:
> > >
> > > Have you tried sysrq t? See the Documentation/sysrq.txt file.
> >
> >This is a headless system.
>
> You could try netconsole.
Haven't heard of it before. Will look into it. But I doubt it will help
pinpoint the semaphore holder, if all I can do is sysrq stuff.
>
> > >
> > > How stuck is the system?
> > >
> > > Keith
> >
> >Very. Only pingable, but can't login via
> telnet/ssh/anything. Reason is
> >the same reason the low priority mystery task is unable to run and
> >release the held semaphore.
>
> (hmm. I'm obviously missing some original context here)
>
> Sounds like there must be another player who is RT prio + spinning.
>
> -Mike
Very good! Yes, I left out that piece of detail in my original posting.
There is a real low priority (4) SCHED_FIFO (hence still higher than any
SCHED_OTHER) task spinning. But it is not the semaphore holder. I am
trying to identify which kernel thread (because that's most likely)
running at SCHED_OTHER real low priority (too nice) is holding the
semaphore, locking out a priority 50 SCHED_FIFO task in its sys_write()
as a result.
Thanks
- Bhavesh
Bhavesh P. Davda | Distinguished Member of Technical Staff | Avaya |
1300 West 120th Avenue | B3-B03 | Westminster, CO 80234 | U.S.A. |
Voice/Fax: 303.538.4438 | bhavesh@avaya.com
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Debugging kernel semaphore contention and priority inversion
2005-08-18 14:50 Davda, Bhavesh P (Bhavesh)
@ 2005-08-18 15:13 ` Hal Wigoda
2005-08-18 15:15 ` Steven Rostedt
1 sibling, 0 replies; 9+ messages in thread
From: Hal Wigoda @ 2005-08-18 15:13 UTC (permalink / raw)
To: Davda, Bhavesh P (Bhavesh); +Cc: linux-kernel, Mike Galbraith
will netconsole install on mandriva?
On Aug 18, 2005, at 9:50 AM, Davda, Bhavesh P ((Bhavesh)) wrote:
>> -----Original Message-----
>> From: Mike Galbraith [mailto:efault@gmx.de]
>> Sent: Wednesday, August 17, 2005 11:10 PM
>> At 09:43 PM 8/17/2005 -0600, you wrote:
>>>>
>>>> Have you tried sysrq t? See the Documentation/sysrq.txt file.
>>>
>>> This is a headless system.
>>
>> You could try netconsole.
>
> Haven't heard of it before. Will look into it. But I doubt it will help
> pinpoint the semaphore holder, if all I can do is sysrq stuff.
>
>>
>>>>
>>>> How stuck is the system?
>>>>
>>>> Keith
>>>
>>> Very. Only pingable, but can't login via
>> telnet/ssh/anything. Reason is
>>> the same reason the low priority mystery task is unable to run and
>>> release the held semaphore.
>>
>> (hmm. I'm obviously missing some original context here)
>>
>> Sounds like there must be another player who is RT prio + spinning.
>>
>> -Mike
>
> Very good! Yes, I left out that piece of detail in my original posting.
> There is a real low priority (4) SCHED_FIFO (hence still higher than
> any
> SCHED_OTHER) task spinning. But it is not the semaphore holder. I am
> trying to identify which kernel thread (because that's most likely)
> running at SCHED_OTHER real low priority (too nice) is holding the
> semaphore, locking out a priority 50 SCHED_FIFO task in its sys_write()
> as a result.
>
> Thanks
>
> - Bhavesh
>
> Bhavesh P. Davda | Distinguished Member of Technical Staff | Avaya |
> 1300 West 120th Avenue | B3-B03 | Westminster, CO 80234 | U.S.A. |
> Voice/Fax: 303.538.4438 | bhavesh@avaya.com
> -
> To unsubscribe from this list: send the line "unsubscribe
> linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* RE: Debugging kernel semaphore contention and priority inversion
2005-08-18 14:50 Davda, Bhavesh P (Bhavesh)
2005-08-18 15:13 ` Hal Wigoda
@ 2005-08-18 15:15 ` Steven Rostedt
1 sibling, 0 replies; 9+ messages in thread
From: Steven Rostedt @ 2005-08-18 15:15 UTC (permalink / raw)
To: Davda, Bhavesh P (Bhavesh); +Cc: linux-kernel, Mike Galbraith
On Thu, 2005-08-18 at 08:50 -0600, Davda, Bhavesh P (Bhavesh) wrote:
> > >This is a headless system.
> >
> > You could try netconsole.
>
> Haven't heard of it before. Will look into it. But I doubt it will help
> pinpoint the semaphore holder, if all I can do is sysrq stuff.
Or does this system have a serial? This is just as good (if not better)
than netconsole, since it is simpler, and netconsole still needs to use
the IP stack. Although, in most cases netconsole works for me. But with
a serial, you can also send commands. Netconsole is better if you need
to send lots of data, and need a higher speed data transfer rate.
I usually set up a minicom attached to the target computer, and on the
target run "cat /dev/ttyS0 > /dev/null &" just to open a port. This is
needed, since the reading of the serial will only be done if something
is actually reading from it. On the kernel command line you will also
need to add, console=ttyS0,115200N8 (change the baud to whatever, since
here I use 115200). From minicom, you can send a sysrq-t with C-a f t.
The C-a f sends a break, and the t tells the kernel this is a sysrq-t.
Read Documentation/serial-console.txt for more information.
For netconsole read: Documentation/networking/netconsole.txt
>
> >
> > > >
> > > > How stuck is the system?
> > > >
> > > > Keith
> > >
> > >Very. Only pingable, but can't login via
> > telnet/ssh/anything. Reason is
> > >the same reason the low priority mystery task is unable to run and
> > >release the held semaphore.
> >
> > (hmm. I'm obviously missing some original context here)
> >
> > Sounds like there must be another player who is RT prio + spinning.
> >
> > -Mike
>
> Very good! Yes, I left out that piece of detail in my original posting.
> There is a real low priority (4) SCHED_FIFO (hence still higher than any
> SCHED_OTHER) task spinning. But it is not the semaphore holder. I am
> trying to identify which kernel thread (because that's most likely)
> running at SCHED_OTHER real low priority (too nice) is holding the
> semaphore, locking out a priority 50 SCHED_FIFO task in its sys_write()
> as a result.
Also have a look at Ingo Molnar's RT patch. It takes care of priority
inversion (with priority inheritance) and also has lots of other nifty
features to debug semaphores and locks.
You can find it here:
http://people.redhat.com/mingo/realtime-preempt/
-- Steve
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Debugging kernel semaphore contention and priority inversion
2005-08-18 3:43 Debugging kernel semaphore contention and priority inversion Davda, Bhavesh P (Bhavesh)
@ 2005-08-18 17:39 ` Keith Mannthey
0 siblings, 0 replies; 9+ messages in thread
From: Keith Mannthey @ 2005-08-18 17:39 UTC (permalink / raw)
To: Davda, Bhavesh P (Bhavesh); +Cc: linux-kernel
On 8/17/05, Davda, Bhavesh P (Bhavesh) <bhavesh@avaya.com> wrote:
> > From: Keith Mannthey [mailto:kmannth@gmail.com]
> > Sent: Wednesday, August 17, 2005 5:33 PM
> >
> > On 8/17/05, Davda, Bhavesh P (Bhavesh) <bhavesh@avaya.com> wrote:
> > > Is there a way to know which task has a particular (struct
> > semaphore
> > > *) down()ed, leading to another task's down() blocking on it?
> >
> > I would add a field to struct semaphore that tracks the
> > current process.
> > In your various up and downs have that field tracks the
> > "current" process.
>
> Yeah, I thought about that. Unfortunately, it doesn't meet my need for
> not Heisenberg'ing the system. I can't instrument the struct semaphore
> {} in a running system.
What kernel are you using?
Can you do some form of a crash dump (maybe some diskdump thing)?
It is hard to debug without insturmentation of some kind.... You are
most likely going to have to rebuild/change your current kernel to
sort this issue out....
> > This way you dump the semaphore you can see what task it is
> > holding it. Have the module dump the semaphore and you can
> > id the task
> >
> > > It would be helpful to get a kernel stacktrace for the culprit too.
> >
> > Have you tried sysrq t? See the Documentation/sysrq.txt file.
>
> This is a headless system.
How do you know you are spinning on some inode semaphore? If the
system is only headless how do you know you are dealing with some
priority inversion issue? Maybe the system has a panic or ????
It seems to me you might be jumping to conclusions.
> >
> > How stuck is the system?
> >
> > Keith
>
> Very. Only pingable, but can't login via telnet/ssh/anything. Reason is
> the same reason the low priority mystery task is unable to run and
> release the held semaphore.
From the present state you have described you would be unable to
load a module or interact with the box in anyway. It is really hard to
debug a kernel without a console. As others have suggested a serial
console/net console would help a bunch.
Good luck!
Keith
^ permalink raw reply [flat|nested] 9+ messages in thread
* RE: Debugging kernel semaphore contention and priority inversion
@ 2005-08-18 17:50 Davda, Bhavesh P (Bhavesh)
0 siblings, 0 replies; 9+ messages in thread
From: Davda, Bhavesh P (Bhavesh) @ 2005-08-18 17:50 UTC (permalink / raw)
To: Keith Mannthey; +Cc: linux-kernel
> -----Original Message-----
> From: Keith Mannthey [mailto:kmannth@gmail.com]
> Sent: Thursday, August 18, 2005 11:40 AM
> On 8/17/05, Davda, Bhavesh P (Bhavesh) <bhavesh@avaya.com> wrote:
> > > From: Keith Mannthey [mailto:kmannth@gmail.com]
> > > Sent: Wednesday, August 17, 2005 5:33 PM
> > >
> > > On 8/17/05, Davda, Bhavesh P (Bhavesh) <bhavesh@avaya.com> wrote:
> > > > Is there a way to know which task has a particular (struct
> > > semaphore
> > > > *) down()ed, leading to another task's down() blocking on it?
> > >
> > > I would add a field to struct semaphore that tracks the current
> > > process.
> > > In your various up and downs have that field tracks the "current"
> > > process.
> >
> > Yeah, I thought about that. Unfortunately, it doesn't meet
> my need for
> > not Heisenberg'ing the system. I can't instrument the
> struct semaphore
> > {} in a running system.
>
> What kernel are you using?
2.6.11.12 from kernel.org
> Can you do some form of a crash dump (maybe some diskdump thing)?
> It is hard to debug without insturmentation of some kind....
> You are most likely going to have to rebuild/change your
> current kernel to sort this issue out....
I've written a trivial debug module that can poke around in kernel data
sturctures. Currently it dumps out any task's user space registers, some
flags in its signal structure, and its kernel stack.
I'm considering enhancing it to get to the inode->i_sem semaphore I
suspect and dump out its contents too.
> > > This way you dump the semaphore you can see what task it
> is holding
> > > it. Have the module dump the semaphore and you can id the task
> > >
> > > > It would be helpful to get a kernel stacktrace for the
> culprit too.
> > >
> > > Have you tried sysrq t? See the Documentation/sysrq.txt file.
> >
> > This is a headless system.
>
> How do you know you are spinning on some inode semaphore?
> If the system is only headless how do you know you are
> dealing with some priority inversion issue? Maybe the system
> has a panic or ????
I can boost a remote ssh shell up to SCHED_RR/SCHED_FIFO priority 99. I
know that the root cause of this priority inverstion/starvation issue is
a bug in a priority 4 SCHED_FIFO task, but want to get to the bottom of
who is being starved.
>
> It seems to me you might be jumping to conclusions.
I don't think so. I've unwound the kernel stack (as dumped out by my
debug module) to determine that it is stuck on a __down() of the
inode->i_sem semaphore for a file I know (the fd is on the kernel stack,
and I know which file from /proc/pid/task/tid/fd)
>
> > >
> > > How stuck is the system?
> > >
> > > Keith
> >
> > Very. Only pingable, but can't login via
> telnet/ssh/anything. Reason
> > is the same reason the low priority mystery task is unable
> to run and
> > release the held semaphore.
>
> From the present state you have described you would be
> unable to load a module or interact with the box in anyway.
> It is really hard to debug a kernel without a console. As
> others have suggested a serial console/net console would help a bunch.
>
> Good luck!
>
> Keith
>
I have hooked up a console, but what good is that going to do for
debugging? As I said, sysrq stuff is not enough to get to the bottom of
this. My real question is if anybody knows of other signatures to look
for in the running tasks, to know which one might be holding the
inode->i_sem semaphore, so I can do something about that task.
Thanks
- Bhavesh
Bhavesh P. Davda | Distinguished Member of Technical Staff | Avaya |
1300 West 120th Avenue | B3-B03 | Westminster, CO 80234 | U.S.A. |
Voice/Fax: 303.538.4438 | bhavesh@avaya.com
^ permalink raw reply [flat|nested] 9+ messages in thread
* RE: Debugging kernel semaphore contention and priority inversion
[not found] <21FFE0795C0F654FAD783094A9AE1DFC0830FE03@cof110avexu4.glob al.avaya.com>
@ 2005-08-19 5:16 ` Mike Galbraith
0 siblings, 0 replies; 9+ messages in thread
From: Mike Galbraith @ 2005-08-19 5:16 UTC (permalink / raw)
To: Davda, Bhavesh P (Bhavesh); +Cc: linux-kernel
At 08:50 AM 8/18/2005 -0600, Davda, Bhavesh P \(Bhavesh\) wrote:
>
> > Sounds like there must be another player who is RT prio + spinning.
>
>Very good! Yes, I left out that piece of detail in my original posting.
>There is a real low priority (4) SCHED_FIFO (hence still higher than any
>SCHED_OTHER) task spinning.
That's a (fairly) deadly bug.
-Mike
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2005-08-19 5:17 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-08-18 3:43 Debugging kernel semaphore contention and priority inversion Davda, Bhavesh P (Bhavesh)
2005-08-18 17:39 ` Keith Mannthey
[not found] <21FFE0795C0F654FAD783094A9AE1DFC0830FE03@cof110avexu4.glob al.avaya.com>
2005-08-19 5:16 ` Mike Galbraith
-- strict thread matches above, loose matches on Subject: below --
2005-08-18 17:50 Davda, Bhavesh P (Bhavesh)
2005-08-18 14:50 Davda, Bhavesh P (Bhavesh)
2005-08-18 15:13 ` Hal Wigoda
2005-08-18 15:15 ` Steven Rostedt
2005-08-17 22:52 Davda, Bhavesh P (Bhavesh)
2005-08-17 23:33 ` Keith Mannthey
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox