public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* XFS internal error xfs_da_do_buf(1) at line 2015 of file fs/xfs/xfs_da_btree.c
@ 2009-01-04  1:16 Adam Nielsen
  2009-01-04  7:46 ` Christoph Hellwig
  0 siblings, 1 reply; 8+ messages in thread
From: Adam Nielsen @ 2009-01-04  1:16 UTC (permalink / raw)
  To: LKML Mailinglist

Hi all,

I'm having a recurring problem with XFS which started about a day ago.  All of 
a sudden when reading a certain part of the disk (not sure where, but my 
nightly backups trigger it) I get an infinite loop of these messages appearing 
in my logs:

xfs_da_do_buf: bno 8388608
dir: inode 3087268096
Filesystem "md0": XFS internal error xfs_da_do_buf(1) at line 2015 of file 
fs/xfs/xfs_da_btree.c.  Caller 0xffffffff802eba63
Pid: 4445, comm: metalog Tainted: P           2.6.28-rc2 #3
Call Trace:
  [<ffffffff802eba63>] xfs_da_read_buf+0x24/0x29
  [<ffffffff802eb6aa>] xfs_da_do_buf+0x2d2/0x621
  [<ffffffff80267fb0>] balance_dirty_pages_ratelimited_nr+0x300/0x329
  [<ffffffff802a52af>] block_write_end+0x4a/0x54
  [<ffffffff802eba63>] xfs_da_read_buf+0x24/0x29
  [<ffffffff802ecd0a>] xfs_da_node_lookup_int+0x5b/0x225
  [<ffffffff802ecd0a>] xfs_da_node_lookup_int+0x5b/0x225
  [<ffffffff802f2908>] xfs_dir2_node_lookup+0x43/0xe7
  [<ffffffff802edbff>] xfs_dir2_isleaf+0x19/0x4a
  [<ffffffff802ee357>] xfs_dir_lookup+0x10f/0x14f
  [<ffffffff80297a74>] __d_lookup+0x11a/0x143
  [<ffffffff80314b4b>] xfs_lookup+0x48/0xa5
  [<ffffffff8031d06e>] xfs_vn_lookup+0x3c/0x78
  [<ffffffff8028fafa>] __lookup_hash+0xfa/0x11e
  [<ffffffff8029288d>] do_filp_open+0x159/0x7d7
  [<ffffffff80507856>] _spin_unlock+0x10/0x2a
  [<ffffffff8029a608>] alloc_fd+0x112/0x123
  [<ffffffff8028701e>] do_sys_open+0x48/0xcc
  [<ffffffff8020b3bb>] system_call_fastpath+0x16/0x1b

It doesn't seem to interfere with filesystem use, but metalog is logging 
thousands of these messages into the system log files (the log files grow at 
about 1MB/minute.)

Does anyone know what this error means?  Do I need to reformat the filesystem?

I've restarted a few times and the error goes away until the next nightly 
backup triggers it again.  Killing metalog does seem to stop the messages, so 
perhaps one of the log files is the culprit?  I'm not sure how to map that 
inode or bno back to a filename.  It's always the same bno/inode and always 
reports metalog as the offending program.

Any suggestions how to go about diagnosing this problem?

Many thanks,
Adam.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: XFS internal error xfs_da_do_buf(1) at line 2015 of file fs/xfs/xfs_da_btree.c
  2009-01-04  1:16 XFS internal error xfs_da_do_buf(1) at line 2015 of file fs/xfs/xfs_da_btree.c Adam Nielsen
@ 2009-01-04  7:46 ` Christoph Hellwig
  2009-01-04  9:03   ` Adam Nielsen
  0 siblings, 1 reply; 8+ messages in thread
From: Christoph Hellwig @ 2009-01-04  7:46 UTC (permalink / raw)
  To: Adam Nielsen; +Cc: LKML Mailinglist

On Sun, Jan 04, 2009 at 11:16:22AM +1000, Adam Nielsen wrote:
> Hi all,
>
> I'm having a recurring problem with XFS which started about a day ago.  
> All of a sudden when reading a certain part of the disk (not sure where, 
> but my nightly backups trigger it) I get an infinite loop of these 
> messages appearing in my logs:
>
> xfs_da_do_buf: bno 8388608
> dir: inode 3087268096
> Filesystem "md0": XFS internal error xfs_da_do_buf(1) at line 2015 of 
> file fs/xfs/xfs_da_btree.c.  Caller 0xffffffff802eba63
> Pid: 4445, comm: metalog Tainted: P           2.6.28-rc2 #3

This is a typical result of a power loss scenario with write caches
enabled and without barriers.  Given that md can't pass through barriers
did you disable the write caches on your disk?

> Does anyone know what this error means?  Do I need to reformat the filesystem?

Run xfs_repair over it to fix up the directory, and make sure to
configure your disks properly so that it doesn't happen again..


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: XFS internal error xfs_da_do_buf(1) at line 2015 of file fs/xfs/xfs_da_btree.c
  2009-01-04  7:46 ` Christoph Hellwig
@ 2009-01-04  9:03   ` Adam Nielsen
  2009-01-04  9:23     ` Christoph Hellwig
  0 siblings, 1 reply; 8+ messages in thread
From: Adam Nielsen @ 2009-01-04  9:03 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: LKML Mailinglist

> This is a typical result of a power loss scenario with write caches
> enabled and without barriers.  Given that md can't pass through barriers
> did you disable the write caches on your disk?

No, I didn't realise I had to do that...in fact I didn't even realise SATA 
disks *had* write caches, I thought the cache was for reading only...

> Run xfs_repair over it to fix up the directory, and make sure to
> configure your disks properly so that it doesn't happen again..

Will do, thanks for the advice!  Is there any standard way to disable write 
caching on a SATA disk?  hdparm -W seems to do the trick, but then I can't run 
that until the system is up and running, leaving a small window of opportunity 
for something to go wrong.

Thanks again,
Adam.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: XFS internal error xfs_da_do_buf(1) at line 2015 of file fs/xfs/xfs_da_btree.c
  2009-01-04  9:03   ` Adam Nielsen
@ 2009-01-04  9:23     ` Christoph Hellwig
  2009-01-04 11:44       ` Alan Cox
  0 siblings, 1 reply; 8+ messages in thread
From: Christoph Hellwig @ 2009-01-04  9:23 UTC (permalink / raw)
  To: Adam Nielsen; +Cc: Christoph Hellwig, LKML Mailinglist

On Sun, Jan 04, 2009 at 07:03:23PM +1000, Adam Nielsen wrote:
> No, I didn't realise I had to do that...in fact I didn't even realise 
> SATA disks *had* write caches, I thought the cache was for reading 
> only...

Which would be the better default (it's what high-end disks generally
do by default).  I've been wondering for a while how we can make default
setups in the presence of lvm/dm more secure, but there hasn't been
any progress yet.

>> Run xfs_repair over it to fix up the directory, and make sure to
>> configure your disks properly so that it doesn't happen again..
>
> Will do, thanks for the advice!  Is there any standard way to disable 
> write caching on a SATA disk?  hdparm -W seems to do the trick, but then 
> I can't run that until the system is up and running, leaving a small 
> window of opportunity for something to go wrong.

On Debian based systems you can add -W0 to /etc/default/hdparm and
it gets executed before the root filesystem is remounted read-write,
I'm not sure how other distributions handle it.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: XFS internal error xfs_da_do_buf(1) at line 2015 of file fs/xfs/xfs_da_btree.c
  2009-01-04  9:23     ` Christoph Hellwig
@ 2009-01-04 11:44       ` Alan Cox
  2009-01-04 15:34         ` Christoph Hellwig
  2009-01-05  5:12         ` markus reichelt
  0 siblings, 2 replies; 8+ messages in thread
From: Alan Cox @ 2009-01-04 11:44 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Adam Nielsen, Christoph Hellwig, LKML Mailinglist

> On Debian based systems you can add -W0 to /etc/default/hdparm and
> it gets executed before the root filesystem is remounted read-write,
> I'm not sure how other distributions handle it.

Generally they avoid setting -W0 because it ruins performance and can be
very bad for disk lifetime. The barriers code is there for a reason.

Of course certain distributions default to using LVM for all their file
systems which is completely and mindbogglingly bogus. That both messes up
barriers in some cases and takes a good 10-20% off performance when I've
benched it.

LVM is cool - if you need it, most people don't.

Alan

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: XFS internal error xfs_da_do_buf(1) at line 2015 of file fs/xfs/xfs_da_btree.c
  2009-01-04 11:44       ` Alan Cox
@ 2009-01-04 15:34         ` Christoph Hellwig
  2009-01-04 15:50           ` Andi Kleen
  2009-01-05  5:12         ` markus reichelt
  1 sibling, 1 reply; 8+ messages in thread
From: Christoph Hellwig @ 2009-01-04 15:34 UTC (permalink / raw)
  To: Alan Cox; +Cc: Christoph Hellwig, Adam Nielsen, LKML Mailinglist

On Sun, Jan 04, 2009 at 11:44:25AM +0000, Alan Cox wrote:
> Generally they avoid setting -W0 because it ruins performance and can be
> very bad for disk lifetime. The barriers code is there for a reason.

We've done measurements and for modern NCQ/TCQ disks the performance
for cache off vs cache on + barriers is close.  For ext3 barriers is
generally slightly faster, and for XFS it's even or sometimes even cache
off is faster depending on the workload.

> Of course certain distributions default to using LVM for all their file
> systems which is completely and mindbogglingly bogus. That both messes up
> barriers in some cases and takes a good 10-20% off performance when I've
> benched it.

The thing is that there's no reason for that at all with just a single
underlying disk.  There is absolutely no reason for not passing through
barriers, and there's also no reason why it should be any slower than
our most trivially volume manager, the partition remapping code.  In
fact there's no reason trivial device mapper tables couldn't be handled
by the partition remapping code..


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: XFS internal error xfs_da_do_buf(1) at line 2015 of file fs/xfs/xfs_da_btree.c
  2009-01-04 15:34         ` Christoph Hellwig
@ 2009-01-04 15:50           ` Andi Kleen
  0 siblings, 0 replies; 8+ messages in thread
From: Andi Kleen @ 2009-01-04 15:50 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Alan Cox, Adam Nielsen, LKML Mailinglist, agk

Christoph Hellwig <hch@infradead.org> writes:
>
>> Of course certain distributions default to using LVM for all their file
>> systems which is completely and mindbogglingly bogus. That both messes up
>> barriers in some cases and takes a good 10-20% off performance when I've
>> benched it.
>
> The thing is that there's no reason for that at all with just a single
> underlying disk. 

I've submitted patches to do exactly that in DM some time ago. 
Unfortunately they still didn't make it in (as of 2.6.29 git) for unknown reasons.

-Andi
-- 
ak@linux.intel.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: XFS internal error xfs_da_do_buf(1) at line 2015 of file fs/xfs/xfs_da_btree.c
  2009-01-04 11:44       ` Alan Cox
  2009-01-04 15:34         ` Christoph Hellwig
@ 2009-01-05  5:12         ` markus reichelt
  1 sibling, 0 replies; 8+ messages in thread
From: markus reichelt @ 2009-01-05  5:12 UTC (permalink / raw)
  To: LKML Mailinglist

[-- Attachment #1: Type: text/plain, Size: 578 bytes --]

* Alan Cox <alan@lxorguk.ukuu.org.uk> wrote:

> > On Debian based systems you can add -W0 to /etc/default/hdparm
> > and it gets executed before the root filesystem is remounted
> > read-write, I'm not sure how other distributions handle it.
> 
> Generally they avoid setting -W0 because it ruins performance and
> can be very bad for disk lifetime. The barriers code is there for a
> reason.

First time I read about a possible negative impact of disabled
write-cache to disk lifetime. Do you know of an article for further
reading?

-- 
left blank, right bald

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2009-01-05  5:13 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-01-04  1:16 XFS internal error xfs_da_do_buf(1) at line 2015 of file fs/xfs/xfs_da_btree.c Adam Nielsen
2009-01-04  7:46 ` Christoph Hellwig
2009-01-04  9:03   ` Adam Nielsen
2009-01-04  9:23     ` Christoph Hellwig
2009-01-04 11:44       ` Alan Cox
2009-01-04 15:34         ` Christoph Hellwig
2009-01-04 15:50           ` Andi Kleen
2009-01-05  5:12         ` markus reichelt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox