All of lore.kernel.org
 help / color / mirror / Atom feed
* NFS crash on Suse, any ideas on which normal nfs patch could be a cause/fix?
@ 2005-06-09 18:50 Roger Heflin
  2005-06-10  7:56 ` Olaf Kirch
  0 siblings, 1 reply; 5+ messages in thread
From: Roger Heflin @ 2005-06-09 18:50 UTC (permalink / raw)
  To: nfs

[-- Attachment #1: Type: text/plain, Size: 2529 bytes --]

Hello,
 
I have a NFS related crash on Suse, I do have a support call in with Suse,
I really only want to know if there is any patch add/deletes that was to fix
a similar issue to this.   We are using 2.6.5-7.139 which is of course
a Suse patched kernel, that I don't not know exactly what it was patched
with.   We are running nfsv3, with udp, 32k block size.
 
NFS did survive several large bonnie runs to the same nfs servers, with
no signs similar to these.     Also the server seems fine when the event
happens it seems to be a client only problem, other clients can get
to the server being used just fine.
 
Below is the description, I have been looking through the recent nfs
patches and have came up with the following that look possible, but
I have looked at the code that it patches, and don't see signs of the 
code it deletes being in the Suse kernel, so I don't think that is it.
  

 
<http://client.linux-nfs.org/Linux-2.6.x/2.6.7-rc3/linux-2.6.7-01-write_hang
.dif> linux-2.6.7-01-write_hang.dif: 


NFS: remove the WRITEPAGE_ACTIVATE hack. It causes crashes.

 
 
 
		
	Customer untars a 8GB file located on a NFS share back onto the same

NFS share,  They have done this on at least 4 separate (slightly different
configurations) machines, with slightly different results on each machine.
On 1 of the 4 machines it succeeded, on the other 3 it failed.  Customer
has also experienced similar issues doing nfs read/writes with applications
other than tar.  On the other 3 machines, 1 machine locked up and required
power removal, 1 machine gave us a kernel panic, and 1 machine put the
tar process into a permanent unkillable "D" state.   Similar results were
obtained with a user test program, and a 3rd party application software.

The kernel crash message was:

CPU 0: Machine Check Exception:          4 Bank 4: f60da00100000813
  RIP !INEXACT!  10:<ffffffff8022fae2> {copy_user_generic_c0x8/0x26}
  TSC 3d101af6bf756 ADDR 4304010
  Kernel panic: Machine check

We have only so far got this on one of the machines.

Any ideas on what is happening?   All machines are Opteron machines, some
are dual processor machines with one motherboard, and the other machines
are quad processer models with a different motherboard, so it does look
like a software issues since there is some variation of hardware, it also
does
not look like a broken hardware issue, as the machine don't fail on anything
but
these nfs tests.
 
The same commands appear to work on local filesystems.
 
                                     

[-- Attachment #2: Type: text/html, Size: 7626 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2005-06-14 11:06 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-06-09 18:50 NFS crash on Suse, any ideas on which normal nfs patch could be a cause/fix? Roger Heflin
2005-06-10  7:56 ` Olaf Kirch
2005-06-10 13:15   ` Roger Heflin
2005-06-10 20:53   ` Roger Heflin
2005-06-14 11:06     ` Olaf Kirch

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.