* Processes stuck in D state with autofs + smbfs
@ 2002-05-29 17:26 Mike Fedyk
2002-05-30 12:36 ` Urban Widmark
0 siblings, 1 reply; 8+ messages in thread
From: Mike Fedyk @ 2002-05-29 17:26 UTC (permalink / raw)
To: linux-kernel
Hi,
I've been having a recurring problem ever since the 2.2 kernel days between
smbfs and autofs.
I'm currently running 2.4.19-pre6-vm33 on this 2x664Mhz P3 machine, but I've
also had the problem in the previous UP machine.
I'm not sure what information will be helpful in debugging this probem.
Would sysrq+t run through ksymoops be helpful?
I also have this in my kernel log:
May 26 06:33:16 fileserver kernel: Uhhuh. NMI received. Dazed and confused, but trying to continue
May 26 06:33:16 fileserver kernel: You probably have a hardware problem with your RAM chips
I did a quick search and it looks like this might be a memory parity error.
Right now this system runs the scripts that triggers the condition, but I
can do tests on a different machine.
Is there anyone interested in this problem?
Oh, I'm running debian and the samba packages are:
ii samba 2.2.3a-6 A LanManager like file and printer server for Unix.
ii smbfs 2.2.3a-6 mount and umount commands for the smbfs (for kernels >= than
Thanks,
Mike
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Processes stuck in D state with autofs + smbfs
2002-05-29 17:26 Processes stuck in D state with autofs + smbfs Mike Fedyk
@ 2002-05-30 12:36 ` Urban Widmark
2002-05-30 19:18 ` Denis Vlasenko
2002-05-30 20:03 ` Mike Fedyk
0 siblings, 2 replies; 8+ messages in thread
From: Urban Widmark @ 2002-05-30 12:36 UTC (permalink / raw)
To: Mike Fedyk; +Cc: linux-kernel
On Wed, 29 May 2002, Mike Fedyk wrote:
> I'm currently running 2.4.19-pre6-vm33 on this 2x664Mhz P3 machine, but I've
> also had the problem in the previous UP machine.
>
> I'm not sure what information will be helpful in debugging this probem.
> Would sysrq+t run through ksymoops be helpful?
Yes, it could show where the process is stuck. Probably what has happened
is that some process is blocked while holding the smbfs semaphore (there
is one per mount).
All others will then get stuck in 'D' state trying to get that semaphore.
The "classic" way to get this is to have a server that is shutdown while
it is mounted. There are patches to help with that (and if I wasn't so
slow sometimes a simple fix would already be in 2.4.something, after
2.4.19 I promise).
> I also have this in my kernel log:
> May 26 06:33:16 fileserver kernel: Uhhuh. NMI received. Dazed and confused, but trying to continue
> May 26 06:33:16 fileserver kernel: You probably have a hardware problem with your RAM chips
However, this error could (but I don't really know what the effects are of
this) potentially stop a process at some random point. If a process
crashes, for example an oops, while holding the semaphore that semaphore
will still be held and everyone trying to get in will stop in D state.
There are some patches here:
http://www.hojdpunkten.ac.se/054/samba/index.html
But that server appears to be down right now.
There is one patch that uses poll to help with the problem of a server
that is gone, and another that changes a lot of how smbfs sends requests
and additionaly makes the user processes always(?) be interruptible.
But if the NMIs are killing things at random points then none of those
patches will help.
/Urban
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Processes stuck in D state with autofs + smbfs
2002-05-30 12:36 ` Urban Widmark
@ 2002-05-30 19:18 ` Denis Vlasenko
2002-05-30 19:56 ` Mike Fedyk
2002-05-30 20:03 ` Mike Fedyk
1 sibling, 1 reply; 8+ messages in thread
From: Denis Vlasenko @ 2002-05-30 19:18 UTC (permalink / raw)
To: Urban Widmark, Mike Fedyk; +Cc: linux-kernel
On 30 May 2002 10:36, Urban Widmark wrote:
> > I also have this in my kernel log:
> > May 26 06:33:16 fileserver kernel: Uhhuh. NMI received. Dazed and
> > confused, but trying to continue May 26 06:33:16 fileserver kernel: You
> > probably have a hardware problem with your RAM chips
>
> However, this error could (but I don't really know what the effects are of
> this) potentially stop a process at some random point. If a process
> crashes, for example an oops, while holding the semaphore that semaphore
> will still be held and everyone trying to get in will stop in D state.
AFAIK this message says CPU got a spurious NMI. It does not kill the task,
kernel logs this message and returns from NMI interrupt handler.
What does cat /proc/interrupts tell you?
NMI may be truly spurious or a hardware failure indication. Give your box
an overnight run of memtest86.
--
vda
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Processes stuck in D state with autofs + smbfs
2002-05-30 19:18 ` Denis Vlasenko
@ 2002-05-30 19:56 ` Mike Fedyk
0 siblings, 0 replies; 8+ messages in thread
From: Mike Fedyk @ 2002-05-30 19:56 UTC (permalink / raw)
To: Denis Vlasenko; +Cc: Urban Widmark, linux-kernel
On Thu, May 30, 2002 at 05:18:46PM -0200, Denis Vlasenko wrote:
> On 30 May 2002 10:36, Urban Widmark wrote:
> > > I also have this in my kernel log:
> > > May 26 06:33:16 fileserver kernel: Uhhuh. NMI received. Dazed and
> > > confused, but trying to continue May 26 06:33:16 fileserver kernel: You
> > > probably have a hardware problem with your RAM chips
> >
> > However, this error could (but I don't really know what the effects are of
> > this) potentially stop a process at some random point. If a process
> > crashes, for example an oops, while holding the semaphore that semaphore
> > will still be held and everyone trying to get in will stop in D state.
>
> AFAIK this message says CPU got a spurious NMI. It does not kill the task,
> kernel logs this message and returns from NMI interrupt handler.
>
> What does cat /proc/interrupts tell you?
>
What does this tell you?
CPU0 CPU1
0: 106126905 106523397 IO-APIC-edge timer
1: 1290 1261 IO-APIC-edge keyboard
2: 0 0 XT-PIC cascade
8: 2 1 IO-APIC-edge rtc
16: 135638480 135641259 IO-APIC-level eth0
30: 12 8 IO-APIC-level aic7xxx
31: 16837019 16835973 IO-APIC-level aic7xxx
NMI: 1 0
LOC: 212643560 212643582
ERR: 0
MIS: 0
> NMI may be truly spurious or a hardware failure indication. Give your box
> an overnight run of memtest86.
Yes, I planned to do that anyway, thanks.
Mike
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Processes stuck in D state with autofs + smbfs
2002-05-30 12:36 ` Urban Widmark
2002-05-30 19:18 ` Denis Vlasenko
@ 2002-05-30 20:03 ` Mike Fedyk
2002-05-30 21:41 ` Urban Widmark
1 sibling, 1 reply; 8+ messages in thread
From: Mike Fedyk @ 2002-05-30 20:03 UTC (permalink / raw)
To: Urban Widmark; +Cc: linux-kernel
On Thu, May 30, 2002 at 02:36:43PM +0200, Urban Widmark wrote:
> On Wed, 29 May 2002, Mike Fedyk wrote:
>
> > I'm currently running 2.4.19-pre6-vm33 on this 2x664Mhz P3 machine, but I've
> > also had the problem in the previous UP machine.
> >
> > I'm not sure what information will be helpful in debugging this probem.
> > Would sysrq+t run through ksymoops be helpful?
>
> Yes, it could show where the process is stuck. Probably what has happened
> is that some process is blocked while holding the smbfs semaphore (there
> is one per mount).
>
> All others will then get stuck in 'D' state trying to get that semaphore.
>
> The "classic" way to get this is to have a server that is shutdown while
> it is mounted. There are patches to help with that (and if I wasn't so
> slow sometimes a simple fix would already be in 2.4.something, after
> 2.4.19 I promise).
>
Yes, the remote server was shut down and caused this problem.
> > I also have this in my kernel log:
> > May 26 06:33:16 fileserver kernel: Uhhuh. NMI received. Dazed and confused, but trying to continue
> > May 26 06:33:16 fileserver kernel: You probably have a hardware problem with your RAM chips
>
> However, this error could (but I don't really know what the effects are of
> this) potentially stop a process at some random point. If a process
> crashes, for example an oops, while holding the semaphore that semaphore
> will still be held and everyone trying to get in will stop in D state.
>
I will resove this issue soon, but don't forget that the processes stuck in
D state has been happening for a while on another machine also.
>
> There are some patches here:
> http://www.hojdpunkten.ac.se/054/samba/index.html
>
> But that server appears to be down right now.
>
> There is one patch that uses poll to help with the problem of a server
> that is gone, and another that changes a lot of how smbfs sends requests
> and additionaly makes the user processes always(?) be interruptible.
>
Do these require any changes to the samba userspace?
> But if the NMIs are killing things at random points then none of those
> patches will help.
AFAICT, no processes have been killed. I'm going to try to reproduce this
on another machine and I'll post the sysrq+t ksymoops output from that.
I'll probably have to do it next week though.
Mike
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Processes stuck in D state with autofs + smbfs
2002-05-30 20:03 ` Mike Fedyk
@ 2002-05-30 21:41 ` Urban Widmark
2002-05-30 23:20 ` Mike Fedyk
0 siblings, 1 reply; 8+ messages in thread
From: Urban Widmark @ 2002-05-30 21:41 UTC (permalink / raw)
To: Mike Fedyk; +Cc: linux-kernel
On Thu, 30 May 2002, Mike Fedyk wrote:
> Yes, the remote server was shut down and caused this problem.
Then that is probably why it fails, the NMI or any other problem is less
likely. Please try:
http://www.hojdpunkten.ac.se/054/samba/smbfs-2.4.19-pre9-poll.patch
I haven't tested this particular patch, but it is a re-diff of an old one.
Should be ok with -pre6 too.
You don't need to modify samba, but you do need to enable "SMBFS Receive
timeout" in the kernel config.
/Urban
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Processes stuck in D state with autofs + smbfs
2002-05-30 21:41 ` Urban Widmark
@ 2002-05-30 23:20 ` Mike Fedyk
2002-05-30 23:25 ` Urban Widmark
0 siblings, 1 reply; 8+ messages in thread
From: Mike Fedyk @ 2002-05-30 23:20 UTC (permalink / raw)
To: Urban Widmark; +Cc: linux-kernel
On Thu, May 30, 2002 at 11:41:55PM +0200, Urban Widmark wrote:
> On Thu, 30 May 2002, Mike Fedyk wrote:
>
> > Yes, the remote server was shut down and caused this problem.
>
> Then that is probably why it fails, the NMI or any other problem is less
> likely. Please try:
> http://www.hojdpunkten.ac.se/054/samba/smbfs-2.4.19-pre9-poll.patch
>
> I haven't tested this particular patch, but it is a re-diff of an old one.
> Should be ok with -pre6 too.
>
Oh, I'll update to pre9, no problem. Have any previous patches been tested
with smp?
> You don't need to modify samba, but you do need to enable "SMBFS Receive
> timeout" in the kernel config.
Got it.
I'll give it a test and let you know.
Mike
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Processes stuck in D state with autofs + smbfs
2002-05-30 23:20 ` Mike Fedyk
@ 2002-05-30 23:25 ` Urban Widmark
0 siblings, 0 replies; 8+ messages in thread
From: Urban Widmark @ 2002-05-30 23:25 UTC (permalink / raw)
To: Mike Fedyk; +Cc: linux-kernel
On Thu, 30 May 2002, Mike Fedyk wrote:
> Oh, I'll update to pre9, no problem. Have any previous patches been tested
> with smp?
Yes.
/Urban
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2002-05-30 23:25 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-05-29 17:26 Processes stuck in D state with autofs + smbfs Mike Fedyk
2002-05-30 12:36 ` Urban Widmark
2002-05-30 19:18 ` Denis Vlasenko
2002-05-30 19:56 ` Mike Fedyk
2002-05-30 20:03 ` Mike Fedyk
2002-05-30 21:41 ` Urban Widmark
2002-05-30 23:20 ` Mike Fedyk
2002-05-30 23:25 ` Urban Widmark
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox