public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
* XFS stack space crashes - current status?
@ 2006-08-02 13:03 Chris Allen
  2006-08-02 18:59 ` Eric Sandeen
  2006-08-02 21:32 ` Russell Cattelan
  0 siblings, 2 replies; 3+ messages in thread
From: Chris Allen @ 2006-08-02 13:03 UTC (permalink / raw)
  To: linux-xfs

I have a box running XFS over md (raid5) over Fedora core5 2.6.17-1 kernel.

The box contains 16x750GB SATA drives combined into a single 11TB raid5
partition using md, and this partition contains a single XFS filesystem.

I can consistently crash the box within about ten minutes with a simple
perl script that spawns 25 processes each of which loop writing random
files to the filesystem.

The only message I get on the console is something like this:

do_IRQ: stack overflow: 492
 <c0406460>

Once crashed, the box requires a hard reboot to rescue it (and needs to 
resync
the RAID array).


As the box is to be used for a production upload fileserver receiving 
several hundred
simultaneous uploads, I would most likely be seeing this problem lots.

So..... questions:

1. How much is known about this problem? Seeing as it is 100% reproducible,
is there any active development underway to fix it?

2. I have seen postings that say compiling a kernel with 8K stacks will 
fix the
problem. Is this the case? Or will I be able to trigger it again by 
running 100 or
200 simultaneous writes?

3. Any suggestions as to what I should try? At present it looks like I 
am stuck between
finding a fix for XFS and splitting the box into 2 or 3 EXT3 partitions 
(which I really don't
want to do). I have tried ReiserFS (max FS size is 8TB even though the 
FAQ says 16), and
JFS (jfs_fsck segfaults which doesn't fill me with confidence).


Many thanks for any suggestions,

Chris Allen.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: XFS stack space crashes - current status?
  2006-08-02 13:03 XFS stack space crashes - current status? Chris Allen
@ 2006-08-02 18:59 ` Eric Sandeen
  2006-08-02 21:32 ` Russell Cattelan
  1 sibling, 0 replies; 3+ messages in thread
From: Eric Sandeen @ 2006-08-02 18:59 UTC (permalink / raw)
  To: Chris Allen; +Cc: linux-xfs

Chris Allen wrote:

> So..... questions:
> 
> 1. How much is known about this problem? Seeing as it is 100% reproducible,
> is there any active development underway to fix it?

XFS is a lot less stack-heavy than it used to be, but if you put enough 
IO code between sys_write and your disks, it can all add up to a problem.

> 2. I have seen postings that say compiling a kernel with 8K stacks will 
> fix the
> problem. Is this the case? Or will I be able to trigger it again by 
> running 100 or
> 200 simultaneous writes?

More threads probably won't matter.

> 3. Any suggestions as to what I should try? At present it looks like I 
> am stuck between
> finding a fix for XFS and splitting the box into 2 or 3 EXT3 partitions 
> (which I really don't
> want to do). I have tried ReiserFS (max FS size is 8TB even though the 
> FAQ says 16), and
> JFS (jfs_fsck segfaults which doesn't fill me with confidence).

If you can run w/ 8k stacks you will probably be in better shape.

If you want to do a bit of testing, go into do_IRQ() and change the 
warning threshold (STACK_WARN) to something slightly bigger, so that 
you'll get the warning message earlier, and you should also get a 
backtrace that tells you how you got there.

-Eric

> 
> Many thanks for any suggestions,
> 
> Chris Allen.
> 
> 
> 
> 

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: XFS stack space crashes - current status?
  2006-08-02 13:03 XFS stack space crashes - current status? Chris Allen
  2006-08-02 18:59 ` Eric Sandeen
@ 2006-08-02 21:32 ` Russell Cattelan
  1 sibling, 0 replies; 3+ messages in thread
From: Russell Cattelan @ 2006-08-02 21:32 UTC (permalink / raw)
  To: Chris Allen; +Cc: xfs

Chris Allen wrote:
> I have a box running XFS over md (raid5) over Fedora core5 2.6.17-1 
> kernel.
>
> The box contains 16x750GB SATA drives combined into a single 11TB raid5
> partition using md, and this partition contains a single XFS filesystem.
>
> I can consistently crash the box within about ten minutes with a simple
> perl script that spawns 25 processes each of which loop writing random
> files to the filesystem.
Ya md with raid5 and XFS is not real happy with 4k stacks.
I never bothered to spend the time to track down who might be
the worst offenders.
It's not really XFS that is a problem here but the combination
of all the drivers you have stacked up.


You might try turning on 8k stacks and all the stack debugging routines
that will dump stack when you over a preset thread hold.

Which scsi driver are you using?



>
> The only message I get on the console is something like this:
>
> do_IRQ: stack overflow: 492
> <c0406460>
>
> Once crashed, the box requires a hard reboot to rescue it (and needs 
> to resync
> the RAID array).
>
>
> As the box is to be used for a production upload fileserver receiving 
> several hundred
> simultaneous uploads, I would most likely be seeing this problem lots.
>
> So..... questions:
>
> 1. How much is known about this problem? Seeing as it is 100% 
> reproducible,
> is there any active development underway to fix it?
>
> 2. I have seen postings that say compiling a kernel with 8K stacks 
> will fix the
> problem. Is this the case? Or will I be able to trigger it again by 
> running 100 or
> 200 simultaneous writes?
>
> 3. Any suggestions as to what I should try? At present it looks like I 
> am stuck between
> finding a fix for XFS and splitting the box into 2 or 3 EXT3 
> partitions (which I really don't
> want to do). I have tried ReiserFS (max FS size is 8TB even though the 
> FAQ says 16), and
> JFS (jfs_fsck segfaults which doesn't fill me with confidence).
>
>
> Many thanks for any suggestions,
>
> Chris Allen.
>
>
>

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2006-08-02 21:59 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-08-02 13:03 XFS stack space crashes - current status? Chris Allen
2006-08-02 18:59 ` Eric Sandeen
2006-08-02 21:32 ` Russell Cattelan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox