All of lore.kernel.org
 help / color / mirror / Atom feed
* reiserfs3 bug?
@ 2006-04-05 14:39 Jason Keltz
  2006-04-20 13:27 ` reiserfs3 bug? 2.4.32 Jason Keltz
  0 siblings, 1 reply; 4+ messages in thread
From: Jason Keltz @ 2006-04-05 14:39 UTC (permalink / raw)
  To: reiserfs-list

Our Linux 2.4.32 NFS fileserver exports 4 reiserfs 3.6 filesystems to a 
whole bunch of hosts.  Somewhere between every 6-30 days, NFS suddenly 
seems to "hang" (ie. all the hosts gets the "nfs server not responding" 
message).  The server is still up (we can ssh to it/etc).  Up to this 
point, we've thought it was a bug in nfs.  We recently installed SGI kdb 
  (kernel debugger) to help with debugging the problem, and we're 
wondering now whether it is actually reiserfs related.  We need to get 
the output from several crashes in order to do more debugging, although 
we believe the problem is probably the same each time.

In a normal state, the WCHAN column on "ps" output lists the nfsd and 
[kreiserfsd] processes as "end".  When the system gets into this state, 
both the nfsd and [kreiserfsd] processes report "down" for WCHAN.
(I can imagine that if kreiserfsd and nfs are both hanging on the same 
lock, bad things could happen..)

The backtrace when the problem occurs for kreiserfsd yields:

Mar 28 13:04:24 0xf635a000      247        1  0    1   D  0xf635a370 
kreiserfsd
Mar 28 13:04:24 ESP        EIP        Function (args)
Mar 28 13:04:24 0xf635bef8 0xc011b144 schedule+0x2b4 (0xc0452e40, 0x0, 
0xf635a000, 0xd4a10970, 0xd4a10970)
Mar 28 13:04:24                                kernel .text 0xc0100000 
0xc011ae90 0xc011b3d0
Mar 28 13:04:24 0xf635bf40 0xc014680e __wait_on_buffer+0x6e (0xd4a10920, 
0x9e0, 0xf635bf90, 0xc2937000, 0xf8ac6000)
Mar 28 13:04:24                                kernel .text 0xc0100000 
0xc01467a0 0xc0146840
Mar 28 13:04:24 0xf635bf68 0xf8943e79 [reiserfs]flush_commit_list+0x3e9
Mar 28 13:04:24                                reiserfs .text 0xf8921060 
0xf8943a90 0xf8943f60
Mar 28 13:04:24 0xf635bfa8 0xf894801d [reiserfs]flush_async_commits+0x3d 
(0xf7576800, 0xdd667cc0, 0xf635bfd8, 0xf635bfdc, 0x20)
Mar 28 13:04:24                                reiserfs .text 0xf8921060 
0xf8947fe0 0xf8948020
Mar 28 13:04:25 0xf635bfb8 0xf894652b 
[reiserfs]reiserfs_journal_commit_thread+0x1db
Mar 28 13:04:25                                reiserfs .text 0xf8921060 
0xf8946350 0xf89465f0
Mar 28 13:04:25 0xf635bff4 0xc010741e arch_kernel_thread+0x2e
Mar 28 13:04:25                                kernel .text 0xc0100000 
0xc01073f0 0xc0107430

single stepping on the processor after the problem reveals that the 
system is "idle"/not doing anything else with this.

I won't bother including the output of the backtrace of the 256 nfs 
processes on our fileserver here, but they probably give a lot more of 
the story.   If you are interested, please see this link for the full 
details:

http://www.cs.yorku.ca/~jas/fileserver

If anyone has any ideas, or anywhere we could insert debugging code iin 
order to help solve this problem, we would *really* appreciate your help!

We recently upgraded from 2.4.26 to 2.4.32 in the hopes that the bug 
would have been fixed, but it didn't make any difference.

ps: A few times, when we issue the "reboot" command, the systems get 
"unstuck" (systems get "nfs ok") just before the system reboots... 
whatever is stuck seems to get unstuck for a moment before the system is 
rebooted.

Thanks..

Jason Keltz
jas@cs.yorku.ca


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2006-04-21 12:42 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-04-05 14:39 reiserfs3 bug? Jason Keltz
2006-04-20 13:27 ` reiserfs3 bug? 2.4.32 Jason Keltz
2006-04-21  7:47   ` Vladimir V. Saveliev
2006-04-21 12:42     ` Jason Keltz

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.