BUG() in end_page_writeback(), stack overflows and system speed decrease with XFS over USB

public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed

* BUG() in end_page_writeback(), stack overflows and system speed decrease with XFS over USB
@ 2009-11-19  8:57 Juergen Urban
  2009-11-19 18:00 ` Eric Sandeen
  0 siblings, 1 reply; 9+ messages in thread
From: Juergen Urban @ 2009-11-19  8:57 UTC (permalink / raw)
  To: xfs

Hello,

my machine is running very unstable since I use XFS on an external USB 
harddisc (855 GByte XFS partition on 1TByte). One problem was the stack 
overflows caused by the large stack use of XFS, USB, SCSI and VFS in Linux 
2.6.23.13. NFS on XFS caused much more stack overflows. I think I got around 
the stack overflows by disabling preemption, SMP and NFS in Linux, but I am not 
sure about it. I think that I didn't got a message from the stack overflow 
detection after this. I also tried a Live-CD (KNOPPIX), but there are the same 
problems. I exchanged some of the hardware. XFS is decreasing system 
performance.  I use the Linux VDR with DVB-S which seems to increase the 
problems. I was able to record 3 high bandwidth streams in parallel before 
using XFS. Now it has problems to record one high bandwidth stream.  The 
system got a little bit usable after I changed the IO scheduler to deadline.
It is difficult to get a good backtrace of the kernel crash, because the backlog 
is not saved on the internal harddisc (reiserfs and ext3). I was able to find 
out that XFS triggers a BUG() in end_page_writeback() at mm/filemap.c:552:

void end_page_writeback(struct page *page)
{
        if (!TestClearPageReclaim(page) || rotate_reclaimable_page(page)) {
                if (!test_clear_page_writeback(page))
                        BUG();
        }
        smp_mb__after_clear_bit();
        wake_up_page(page, PG_writeback);
}

The backtrace looks like this (Sorry, I needed to write it down from screen 
and I don't have everything):

end_page_writeback()
end_buffer_async_write()
update_stats_wait_end()
xfs_setfilesize()
xfs_???_dealloc()
xfs_destroy_ioend()
run_workqueue()

After searching in the code I found:
/* TODO: cleanup count and page_dirty */

It seems that page_dirty may be handled wrong and could cause the problem, but 
I don't know the purpose of this stuff. The same comment is in the latest 
source code from GIT.
After running the system for while, I was able to trigger the kernel crash by 
starting "sync" in the command line.
My stack traces includes often dvb_dmx_swfilter_packets(), do_IRQ()/tasklets 
and sys_write()/vfs_write(). I can't scroll up in most situations.
Can anyone help me?
Is there an easy way to backup the data or replace the file system without 
kernel crash in between?

Best regards
Juergen Urban

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: BUG() in end_page_writeback(), stack overflows and system speed decrease with XFS over USB
  2009-11-19  8:57 BUG() in end_page_writeback(), stack overflows and system speed decrease with XFS over USB Juergen Urban
@ 2009-11-19 18:00 ` Eric Sandeen
  2009-11-20 16:23   ` Juergen Urban
  0 siblings, 1 reply; 9+ messages in thread
From: Eric Sandeen @ 2009-11-19 18:00 UTC (permalink / raw)
  To: Juergen Urban; +Cc: xfs

Juergen Urban wrote:
> Hello,
> 
> my machine is running very unstable since I use XFS on an external USB 
> harddisc (855 GByte XFS partition on 1TByte). One problem was the stack 
> overflows caused by the large stack use of XFS, USB, SCSI and VFS in Linux 
> 2.6.23.13. NFS on XFS caused much more stack overflows. I think I got around 
> the stack overflows by disabling preemption, SMP and NFS in Linux, but I am not 
> sure about it. I think that I didn't got a message from the stack overflow 
> detection after this. 

Are you on 4k stacks?  To be honest I'd still expect things to be mostly
ok stack-wise even if so.

> I also tried a Live-CD (KNOPPIX), but there are the same 
> problems. I exchanged some of the hardware. XFS is decreasing system 
> performance.  I use the Linux VDR with DVB-S which seems to increase the 
> problems. I was able to record 3 high bandwidth streams in parallel before 
> using XFS. 

Really, you could record 3 parallel high-def TV streams to ext3 via USB?
I guess I'm a little surprised...

> Now it has problems to record one high bandwidth stream.  The 
> system got a little bit usable after I changed the IO scheduler to deadline.
> It is difficult to get a good backtrace of the kernel crash, because the backlog 
> is not saved on the internal harddisc (reiserfs and ext3). I was able to find 
> out that XFS triggers a BUG() in end_page_writeback() at mm/filemap.c:552:
> 
> void end_page_writeback(struct page *page)
> {
>         if (!TestClearPageReclaim(page) || rotate_reclaimable_page(page)) {
>                 if (!test_clear_page_writeback(page))
>                         BUG();
>         }
>         smp_mb__after_clear_bit();
>         wake_up_page(page, PG_writeback);
> }

Regarding the bug, if there is any way to test a kernel newer than .23,
I'd start there; I don't know offhand of a bug that was fixed here, but
.23 was a long time ago...

> The backtrace looks like this (Sorry, I needed to write it down from screen 
> and I don't have everything):
> 
> end_page_writeback()
> end_buffer_async_write()
> update_stats_wait_end()
> xfs_setfilesize()
> xfs_???_dealloc()
> xfs_destroy_ioend()
> run_workqueue()
> 
> After searching in the code I found:
> /* TODO: cleanup count and page_dirty */
> 
> It seems that page_dirty may be handled wrong and could cause the problem, but 
> I don't know the purpose of this stuff. The same comment is in the latest 
> source code from GIT.
> After running the system for while, I was able to trigger the kernel crash by 
> starting "sync" in the command line.
> My stack traces includes often dvb_dmx_swfilter_packets(), do_IRQ()/tasklets 
> and sys_write()/vfs_write(). I can't scroll up in most situations.
> Can anyone help me?
> Is there an easy way to backup the data or replace the file system without 
> kernel crash in between?

You should certainly be able to copy data off xfs via usb; if it's
failing, I guess we'll need more info to find out why, but I'd suggest
at least booting a newer livecd to do that copy and see if things fare
better.

-Eric

> Best regards
> Juergen Urban
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
> 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: BUG() in end_page_writeback(),  stack overflows and system speed decrease with XFS over USB
  2009-11-19 18:00 ` Eric Sandeen
@ 2009-11-20 16:23   ` Juergen Urban
  2009-11-20 16:36     ` Eric Sandeen
  0 siblings, 1 reply; 9+ messages in thread
From: Juergen Urban @ 2009-11-20 16:23 UTC (permalink / raw)
  To: xfs

On Thursday 19 November 2009 19:00:16 Eric Sandeen wrote:
> Juergen Urban wrote:
> > Hello,
> >
> > my machine is running very unstable since I use XFS on an external USB
> > harddisc (855 GByte XFS partition on 1TByte). One problem was the stack
> > overflows caused by the large stack use of XFS, USB, SCSI and VFS in
> > Linux 2.6.23.13. NFS on XFS caused much more stack overflows. I think I
> > got around the stack overflows by disabling preemption, SMP and NFS in
> > Linux, but I am not sure about it. I think that I didn't got a message
> > from the stack overflow detection after this.
>
> Are you on 4k stacks?  To be honest I'd still expect things to be mostly
> ok stack-wise even if so.

No, I am using 8k stacks.

>
> > I also tried a Live-CD (KNOPPIX), but there are the same
> > problems. I exchanged some of the hardware. XFS is decreasing system
> > performance.  I use the Linux VDR with DVB-S which seems to increase the
> > problems. I was able to record 3 high bandwidth streams in parallel
> > before using XFS.
>
> Really, you could record 3 parallel high-def TV streams to ext3 via USB?
> I guess I'm a little surprised...
>

No, I meant that I was able to record 3 high bandwidth SDTV streams on the 
internal hard disc with ext3. Then I've got an external USB drive and 
formatted it with XFS, because someone told me that XFS is running stable with 
VDR on an internal hard disc.

> > Now it has problems to record one high bandwidth stream.  The
> > system got a little bit usable after I changed the IO scheduler to
> > deadline. It is difficult to get a good backtrace of the kernel crash,
> > because the backlog is not saved on the internal harddisc (reiserfs and
> > ext3). I was able to find out that XFS triggers a BUG() in
> > end_page_writeback() at mm/filemap.c:552:
> >
> > void end_page_writeback(struct page *page)
> > {
> >         if (!TestClearPageReclaim(page) || rotate_reclaimable_page(page))
> > { if (!test_clear_page_writeback(page))
> >                         BUG();
> >         }
> >         smp_mb__after_clear_bit();
> >         wake_up_page(page, PG_writeback);
> > }
>
> Regarding the bug, if there is any way to test a kernel newer than .23,
> I'd start there; I don't know offhand of a bug that was fixed here, but
> .23 was a long time ago...

Now I tried linux-2.6.31.6. My system hangs in the start scripts. Maybe this 
is caused by network scripts. I got the message that ehci_hcd need to be 
loaded before uhci_hcd and ohci_hcd. I skipped uhci_hcd and ohci_hcd in 
/etc/discover.conf. Now I have a higher performance with linux-2.6.23.13 and I 
can record 3 normal streams in parallel on the with USB and XFS. But it is 
still unstable. The last error what I got was in block_prepare_write 
(fs/buffer.c). This caused follow up errors in do_invalidate_page() called by 
xfs_get_blocks().
Sometimes there are file system deadlocks. I can do everything, but not access 
the file system. Every try to access the file system leads to a deadlock of the 
program. This normally happens after a kernel exception.

>
> > The backtrace looks like this (Sorry, I needed to write it down from
> > screen and I don't have everything):
> >
> > end_page_writeback()
> > end_buffer_async_write()
> > update_stats_wait_end()
> > xfs_setfilesize()
> > xfs_???_dealloc()
> > xfs_destroy_ioend()
> > run_workqueue()
> >
> > After searching in the code I found:
> > /* TODO: cleanup count and page_dirty */
> >
> > It seems that page_dirty may be handled wrong and could cause the
> > problem, but I don't know the purpose of this stuff. The same comment is
> > in the latest source code from GIT.
> > After running the system for while, I was able to trigger the kernel
> > crash by starting "sync" in the command line.
> > My stack traces includes often dvb_dmx_swfilter_packets(),
> > do_IRQ()/tasklets and sys_write()/vfs_write(). I can't scroll up in most
> > situations. Can anyone help me?
> > Is there an easy way to backup the data or replace the file system
> > without kernel crash in between?
>
> You should certainly be able to copy data off xfs via usb; if it's
> failing, I guess we'll need more info to find out why, but I'd suggest
> at least booting a newer livecd to do that copy and see if things fare
> better.

My idea was to shrink it and create a new partition where I can copy the data. 
As far as I understand I need to mount it for the shrink process, so I may 
have the problem of kernel exceptions while shrinking.

>
> -Eric
>
> > Best regards
> > Juergen Urban
> >
> > _______________________________________________
> > xfs mailing list
> > xfs@oss.sgi.com
> > http://oss.sgi.com/mailman/listinfo/xfs
>
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: BUG() in end_page_writeback(), stack overflows and system speed decrease with XFS over USB
  2009-11-20 16:23   ` Juergen Urban
@ 2009-11-20 16:36     ` Eric Sandeen
  2009-11-20 17:08       ` Eric Sandeen
  0 siblings, 1 reply; 9+ messages in thread
From: Eric Sandeen @ 2009-11-20 16:36 UTC (permalink / raw)
  To: Juergen Urban; +Cc: xfs

Juergen Urban wrote:
> On Thursday 19 November 2009 19:00:16 Eric Sandeen wrote:
>> Juergen Urban wrote:
>>> Hello,
>>>
>>> my machine is running very unstable since I use XFS on an external USB
>>> harddisc (855 GByte XFS partition on 1TByte). One problem was the stack
>>> overflows caused by the large stack use of XFS, USB, SCSI and VFS in
>>> Linux 2.6.23.13. NFS on XFS caused much more stack overflows. I think I
>>> got around the stack overflows by disabling preemption, SMP and NFS in
>>> Linux, but I am not sure about it. I think that I didn't got a message
>>> from the stack overflow detection after this.
>> Are you on 4k stacks?  To be honest I'd still expect things to be mostly
>> ok stack-wise even if so.
> 
> No, I am using 8k stacks.

Hmm.

...

> Now I tried linux-2.6.31.6. My system hangs in the start scripts. Maybe this 
> is caused by network scripts. I got the message that ehci_hcd need to be 
> loaded before uhci_hcd and ohci_hcd. I skipped uhci_hcd and ohci_hcd in 
> /etc/discover.conf. Now I have a higher performance with linux-2.6.23.13 and I 
> can record 3 normal streams in parallel on the with USB and XFS. But it is 
> still unstable. The last error what I got was in block_prepare_write 
> (fs/buffer.c). This caused follow up errors in do_invalidate_page() called by 
> xfs_get_blocks().
> Sometimes there are file system deadlocks. I can do everything, but not access 
> the file system. Every try to access the file system leads to a deadlock of the 
> program. This normally happens after a kernel exception.

I guess including the kernel messages you see would help us know what
might be going on.

...

>>> Is there an easy way to backup the data or replace the file system
>>> without kernel crash in between?
>> You should certainly be able to copy data off xfs via usb; if it's
>> failing, I guess we'll need more info to find out why, but I'd suggest
>> at least booting a newer livecd to do that copy and see if things fare
>> better.
> 
> My idea was to shrink it and create a new partition where I can copy the data. 
> As far as I understand I need to mount it for the shrink process, so I may 
> have the problem of kernel exceptions while shrinking.

You can't shrink xfs, if that's what you mean.

-Eric

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: BUG() in end_page_writeback(), stack overflows and system speed decrease with XFS over USB
  2009-11-20 16:36     ` Eric Sandeen
@ 2009-11-20 17:08       ` Eric Sandeen
  2009-11-21  1:00         ` Juergen Urban
  0 siblings, 1 reply; 9+ messages in thread
From: Eric Sandeen @ 2009-11-20 17:08 UTC (permalink / raw)
  To: Juergen Urban; +Cc: xfs

Eric Sandeen wrote:
> Juergen Urban wrote:
>> On Thursday 19 November 2009 19:00:16 Eric Sandeen wrote:
>>> Juergen Urban wrote:
>>>> Hello,
>>>>
>>>> my machine is running very unstable since I use XFS on an external USB
>>>> harddisc (855 GByte XFS partition on 1TByte). One problem was the stack
>>>> overflows caused by the large stack use of XFS, USB, SCSI and VFS in
>>>> Linux 2.6.23.13. NFS on XFS caused much more stack overflows. I think I
>>>> got around the stack overflows by disabling preemption, SMP and NFS in
>>>> Linux, but I am not sure about it. I think that I didn't got a message
>>>> from the stack overflow detection after this.
>>> Are you on 4k stacks?  To be honest I'd still expect things to be mostly
>>> ok stack-wise even if so.
>> No, I am using 8k stacks.
> 
> Hmm.
> 

BTW if you are still seeing stack overflows when testing w/ the newer
kernel, we can use some tracing to see what the stack backtrace is:

sysctl -w kernel.stack_tracer_enabled=1
mount -t debugfs none /sys/kernel/debug/

and then look in:

/sys/kernel/debug/tracing/stack_trace
/sys/kernel/debug/tracing/stack_max_size

-Eric

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: BUG() in end_page_writeback(), stack overflows and system speed decrease with XFS over USB
  2009-11-20 17:08       ` Eric Sandeen
@ 2009-11-21  1:00         ` Juergen Urban
  2009-11-21 10:51           ` Michael Monnerie
  0 siblings, 1 reply; 9+ messages in thread
From: Juergen Urban @ 2009-11-21  1:00 UTC (permalink / raw)
  To: xfs

On Friday 20 November 2009 18:08:12 Eric Sandeen wrote:
> Eric Sandeen wrote:
> > Juergen Urban wrote:
> >> On Thursday 19 November 2009 19:00:16 Eric Sandeen wrote:
> >>> Juergen Urban wrote:
> >>>> Hello,
> >>>>
> >>>> my machine is running very unstable since I use XFS on an external USB
> >>>> harddisc (855 GByte XFS partition on 1TByte). One problem was the
> >>>> stack overflows caused by the large stack use of XFS, USB, SCSI and
> >>>> VFS in Linux 2.6.23.13. NFS on XFS caused much more stack overflows. I
> >>>> think I got around the stack overflows by disabling preemption, SMP
> >>>> and NFS in Linux, but I am not sure about it. I think that I didn't
> >>>> got a message from the stack overflow detection after this.
> >>>
> >>> Are you on 4k stacks?  To be honest I'd still expect things to be
> >>> mostly ok stack-wise even if so.
> >>
> >> No, I am using 8k stacks.
> >
> > Hmm.
>
> BTW if you are still seeing stack overflows when testing w/ the newer
> kernel, we can use some tracing to see what the stack backtrace is:
>
> sysctl -w kernel.stack_tracer_enabled=1
> mount -t debugfs none /sys/kernel/debug/
>
> and then look in:
>
> /sys/kernel/debug/tracing/stack_trace
> /sys/kernel/debug/tracing/stack_max_size
>

I don't think that I still get stack overflows in Linux 2.6.23.13.

Now I downloaded the new KNOPPIX with Linux 2.6.31.6. I've got also crashes 
with it, but this also could be a stack overflow, because the kernel 
configuration includes SMP and preemption. The problem here is that I don't see 
any kernel message. I just see the blinking keyboard or sometimes even 
nothing. To get the error I just need to play video with mplayer and don't 
need to write to the file system at all. I need to wait until I've got a good 
backtrace, but this will be with Linux 2.6.23.13.
The console tools of my installed Debian can't configure or use /dev/tty[0-9]* 
of Linux 2.6.31.6. The ioctls are not supported. I installed a newer version 
of the package, but this doesn't help. I copied the ".config" from Linux 
2.6.23.13 to 2.6.31.6. Maybe this is not compatible.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: BUG() in end_page_writeback(), stack overflows and system speed decrease with XFS over USB
  2009-11-21  1:00         ` Juergen Urban
@ 2009-11-21 10:51           ` Michael Monnerie
  2009-11-21 17:33             ` Juergen Urban
  0 siblings, 1 reply; 9+ messages in thread
From: Michael Monnerie @ 2009-11-21 10:51 UTC (permalink / raw)
  To: xfs


[-- Attachment #1.1: Type: Text/Plain, Size: 602 bytes --]

On Samstag, 21. November 2009 Juergen Urban wrote:
> To get the error I just need to play video with mplayer and don't 
> need to write to the file system at all.

This really sounds like a broken hardware. Have you tried running 
MEMTEST?

mfg zmi
-- 
// Michael Monnerie, Ing.BSc    -----      http://it-management.at
// Tel: 0660 / 415 65 31                      .network.your.ideas.
// PGP Key:         "curl -s http://zmi.at/zmi.asc | gpg --import"
// Fingerprint: AC19 F9D5 36ED CD8A EF38  500E CE14 91F7 1C12 09B4
// Keyserver: wwwkeys.eu.pgp.net                  Key-ID: 1C1209B4

[-- Attachment #1.2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: BUG() in end_page_writeback(), stack overflows and system speed decrease with XFS over USB
  2009-11-21 10:51           ` Michael Monnerie
@ 2009-11-21 17:33             ` Juergen Urban
  2009-11-21 22:04               ` Dave Chinner
  0 siblings, 1 reply; 9+ messages in thread
From: Juergen Urban @ 2009-11-21 17:33 UTC (permalink / raw)
  To: xfs

On Saturday 21 November 2009 11:51:23 Michael Monnerie wrote:
> On Samstag, 21. November 2009 Juergen Urban wrote:
> > To get the error I just need to play video with mplayer and don't
> > need to write to the file system at all.
>
> This really sounds like a broken hardware. Have you tried running
> MEMTEST?
>
> mfg zmi

At the first moment I also thought that it is a hardware bug. I already ran 
MEMTEST for some hours, but it doesn't find anything. I replaced the main 
board, the CPU and the power supply. I also tried to replace/remove RAM. But 
I've done this while chasing for the stack overflow. But maybe this a problem 
which I introduced by replacing the hardware. So I will try to change all the 
hardware back.
Can someone explain the purpose of the BUG() in end_page_writeback()? Can I 
remove the line?
Will XFS also work if I disable all address operations? What is the purpose of 
the address operations?

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: BUG() in end_page_writeback(), stack overflows and system speed decrease with XFS over USB
  2009-11-21 17:33             ` Juergen Urban
@ 2009-11-21 22:04               ` Dave Chinner
  0 siblings, 0 replies; 9+ messages in thread
From: Dave Chinner @ 2009-11-21 22:04 UTC (permalink / raw)
  To: Juergen Urban; +Cc: xfs

On Sat, Nov 21, 2009 at 06:33:20PM +0100, Juergen Urban wrote:
> Can someone explain the purpose of the BUG() in end_page_writeback()? Can I 
> remove the line?

A page that is under writeback is supposed to have the PG_writeback
flag set. Hence when writeback is completed (i.e the page is now
clean) we need to clear the PG_writeback bit. The BUG is triggered
if we are ending writeback on a page that does not have PG_writeback
set. IOWs, something is seriously wrong, and could be a memory error
or memory corruption.

> Will XFS also work if I disable all address operations?

No.

> What is the purpose of the address operations?

Reading Documentation/filesystems/vfs.txt might answer your
questions....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2009-11-21 22:04 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-11-19  8:57 BUG() in end_page_writeback(), stack overflows and system speed decrease with XFS over USB Juergen Urban
2009-11-19 18:00 ` Eric Sandeen
2009-11-20 16:23   ` Juergen Urban
2009-11-20 16:36     ` Eric Sandeen
2009-11-20 17:08       ` Eric Sandeen
2009-11-21  1:00         ` Juergen Urban
2009-11-21 10:51           ` Michael Monnerie
2009-11-21 17:33             ` Juergen Urban
2009-11-21 22:04               ` Dave Chinner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox