public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* System lockup with processes in D state in 2.6.16.1
@ 2006-03-30 22:41 Robert Mueller
  2006-03-31  0:58 ` Hans Reiser
  2006-04-01 23:00 ` Oleg Drokin
  0 siblings, 2 replies; 3+ messages in thread
From: Robert Mueller @ 2006-03-30 22:41 UTC (permalink / raw)
  To: linux-kernel, Reiserfs developers mail-list, Oleg Drokin,
	Chris Mason
  Cc: Bron Gondwana, Jeremy Howard

Some time ago, we were seeing a problem in the kernel in the reiserfs code 
where a lock inversion issue could cause processes to get stuck in D state, 
requiring a system reboot.

http://marc.theaimsgroup.com/?t=108932517300001&r=1&w=2

This link describes the actual call path that causes the problem.
http://marc.theaimsgroup.com/?l=linux-kernel&m=109035413201491&w=2

At the time, the solution we got was to add a patch the basically bypasses 
reiser_file_write and just calls generic_file_write. This semed to fix the 
problem and we've been running for over a year fine with that patch.

Recent we brought the issue up again with some reiser people, who mentioned 
that:

> There was a patch for the problem referenced by this link. (By Cris Mason,
> I think). This patch is long included into vanilla kernel
> (2.6.15 certainly contains it). If you still see deadlocks, I guess you 
> need
> to gather some more info again (sysrq-t and friends).

So we recently built 2.6.16.1, without the patch. However after just 1 hour 
of stress testing with cyrus again, we were able to lock up the system with 
lots of processes stuck in D state and a load running to 500+. The machine 
had about 1500 processes running on it, and the dmesg buffer was only 1M, so 
it seems we weren't able to capture all the traces with a sysrq-t, but 
there's still a lot of info. I've put the sysrq-t and kernel config output 
at the links below:

http://kernel.robm.fastmail.fm/sysrq-t-2006-03-30-1.txt
http://kernel.robm.fastmail.fm/kernel-config-2006-03-30-1.txt

Any idea if this is related to the previous problem or is something 
different?

Rob



^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2006-04-01 23:01 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-03-30 22:41 System lockup with processes in D state in 2.6.16.1 Robert Mueller
2006-03-31  0:58 ` Hans Reiser
2006-04-01 23:00 ` Oleg Drokin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox