public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
* xfs corruption - structure need cleaning
@ 2013-10-15 13:45 Roy Zhang
  2013-10-15 13:54 ` Carlos Maiolino
  2013-10-15 20:43 ` [xfs-masters] " Dave Chinner
  0 siblings, 2 replies; 4+ messages in thread
From: Roy Zhang @ 2013-10-15 13:45 UTC (permalink / raw)
  To: xfs-masters, xfs

Hi,
I met a problem that cannot mount xfs, log as below.
I got know xfs_repair -L will fix this situation, I want to know how
and why cause it, it's a bug in xfs or hdd? Is there any patch to fix
it?

Thanks
Roy

Oct  9 18:26:52 mcluster-alpha-node3 kernel: [11840.232985] XFS
(dm-3): Mounting Filesystem
Oct  9 18:26:52 mcluster-alpha-node3 kernel: [11840.327773] XFS
(dm-3): Internal error xlog_clear_stale_blocks(2) at line 1353 of file
fs/xfs/xfs_log_recover.c.  Caller 0xffffffffa01f894d
Oct  9 18:26:52 mcluster-alpha-node3 kernel: [11840.327776]
Oct  9 18:26:52 mcluster-alpha-node3 kernel: [11840.400435] Pid:
20055, comm: mount Not tainted 2.6.32-902.279.9.1.letv.el6.x86_64 #1
Oct  9 18:26:52 mcluster-alpha-node3 kernel: [11840.400437] Call Trace:
Oct  9 18:26:52 mcluster-alpha-node3 kernel: [11840.400477]
[<ffffffffa01e0f1f>] ? xfs_error_report+0x3f/0x50 [xfs]
Oct  9 18:26:52 mcluster-alpha-node3 kernel: [11840.400503]
[<ffffffffa01f894d>] ? xlog_find_tail+0x38d/0x3c0 [xfs]
Oct  9 18:26:52 mcluster-alpha-node3 kernel: [11840.400526]
[<ffffffffa01f5266>] ? xlog_clear_stale_blocks+0x156/0x190 [xfs]
Oct  9 18:26:52 mcluster-alpha-node3 kernel: [11840.400549]
[<ffffffffa01f894d>] ? xlog_find_tail+0x38d/0x3c0 [xfs]
Oct  9 18:26:52 mcluster-alpha-node3 kernel: [11840.400570]
[<ffffffffa01f899e>] ? xlog_recover+0x1e/0x90 [xfs]
Oct  9 18:26:52 mcluster-alpha-node3 kernel: [11840.400592]
[<ffffffffa01f04dc>] ? xfs_log_mount+0xac/0x190 [xfs]
Oct  9 18:26:52 mcluster-alpha-node3 kernel: [11840.400615]
[<ffffffffa01fbb8b>] ? xfs_mountfs+0x36b/0x680 [xfs]
Oct  9 18:26:52 mcluster-alpha-node3 kernel: [11840.400640]
[<ffffffffa02137b4>] ? xfs_fs_fill_super+0x234/0x360 [xfs]
Oct  9 18:26:52 mcluster-alpha-node3 kernel: [11840.400647]
[<ffffffff811ed4ca>] ? disk_name+0xba/0xc0
Oct  9 18:26:52 mcluster-alpha-node3 kernel: [11840.400653]
[<ffffffff8117e54e>] ? get_sb_bdev+0x18e/0x1d0
Oct  9 18:26:52 mcluster-alpha-node3 kernel: [11840.400675]
[<ffffffffa0213580>] ? xfs_fs_fill_super+0x0/0x360 [xfs]
Oct  9 18:26:52 mcluster-alpha-node3 kernel: [11840.400698]
[<ffffffffa02113a8>] ? xfs_fs_get_sb+0x18/0x20 [xfs]
Oct  9 18:26:52 mcluster-alpha-node3 kernel: [11840.400703]
[<ffffffff8117dfdb>] ? vfs_kern_mount+0x7b/0x1b0
Oct  9 18:26:52 mcluster-alpha-node3 kernel: [11840.400707]
[<ffffffff8117e182>] ? do_kern_mount+0x52/0x130
Oct  9 18:26:52 mcluster-alpha-node3 kernel: [11840.400713]
[<ffffffff8119c852>] ? do_mount+0x2d2/0x8d0
Oct  9 18:26:52 mcluster-alpha-node3 kernel: [11840.400719]
[<ffffffff81136014>] ? strndup_user+0x64/0xc0
Oct  9 18:26:52 mcluster-alpha-node3 kernel: [11840.400723]
[<ffffffff8119cee0>] ? sys_mount+0x90/0xe0
Oct  9 18:26:52 mcluster-alpha-node3 kernel: [11840.400730]
[<ffffffff8100b0f2>] ? system_call_fastpath+0x16/0x1b
Oct  9 18:26:52 mcluster-alpha-node3 kernel: [11840.400734] XFS
(dm-3): failed to locate log tail
Oct  9 18:26:52 mcluster-alpha-node3 kernel: [11840.400737] XFS
(dm-3): log mount/recovery failed: error 117
Oct  9 18:26:52 mcluster-alpha-node3 kernel: [11840.400816] XFS
(dm-3): log mount failed

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: xfs corruption - structure need cleaning
  2013-10-15 13:45 xfs corruption - structure need cleaning Roy Zhang
@ 2013-10-15 13:54 ` Carlos Maiolino
  2013-10-15 20:43 ` [xfs-masters] " Dave Chinner
  1 sibling, 0 replies; 4+ messages in thread
From: Carlos Maiolino @ 2013-10-15 13:54 UTC (permalink / raw)
  To: xfs

I'm not the best person with the xfs journal, but I don't think the information
you sent is enough to help with the problem, all I can see is a failure to mount
a filesystem that looks to have some kind of log corruption and couldn't replay
the log, I'd say that in this case you'd need to clear the log with -L option as
you said, but you'll possibly lose some data.
What happened before you needed to mount the fs, i.e how it was umounted, if any
errors happened before that will be more useful to identify possible issues that
led to a log corruption.

This is a good guide of what kind information might be useful:

http://www.xfs.org/index.php/XFS_FAQ#Q:_What_information_should_I_include_when_reporting_a_problem.3F


On Tue, Oct 15, 2013 at 09:45:27PM +0800, Roy Zhang wrote:
> Hi,
> I met a problem that cannot mount xfs, log as below.
> I got know xfs_repair -L will fix this situation, I want to know how
> and why cause it, it's a bug in xfs or hdd? Is there any patch to fix
> it?
> 
> Thanks
> Roy
> 
> Oct  9 18:26:52 mcluster-alpha-node3 kernel: [11840.232985] XFS
> (dm-3): Mounting Filesystem
> Oct  9 18:26:52 mcluster-alpha-node3 kernel: [11840.327773] XFS
> (dm-3): Internal error xlog_clear_stale_blocks(2) at line 1353 of file
> fs/xfs/xfs_log_recover.c.  Caller 0xffffffffa01f894d
> Oct  9 18:26:52 mcluster-alpha-node3 kernel: [11840.327776]
> Oct  9 18:26:52 mcluster-alpha-node3 kernel: [11840.400435] Pid:
> 20055, comm: mount Not tainted 2.6.32-902.279.9.1.letv.el6.x86_64 #1
> Oct  9 18:26:52 mcluster-alpha-node3 kernel: [11840.400437] Call Trace:
> Oct  9 18:26:52 mcluster-alpha-node3 kernel: [11840.400477]
> [<ffffffffa01e0f1f>] ? xfs_error_report+0x3f/0x50 [xfs]
> Oct  9 18:26:52 mcluster-alpha-node3 kernel: [11840.400503]
> [<ffffffffa01f894d>] ? xlog_find_tail+0x38d/0x3c0 [xfs]
> Oct  9 18:26:52 mcluster-alpha-node3 kernel: [11840.400526]
> [<ffffffffa01f5266>] ? xlog_clear_stale_blocks+0x156/0x190 [xfs]
> Oct  9 18:26:52 mcluster-alpha-node3 kernel: [11840.400549]
> [<ffffffffa01f894d>] ? xlog_find_tail+0x38d/0x3c0 [xfs]
> Oct  9 18:26:52 mcluster-alpha-node3 kernel: [11840.400570]
> [<ffffffffa01f899e>] ? xlog_recover+0x1e/0x90 [xfs]
> Oct  9 18:26:52 mcluster-alpha-node3 kernel: [11840.400592]
> [<ffffffffa01f04dc>] ? xfs_log_mount+0xac/0x190 [xfs]
> Oct  9 18:26:52 mcluster-alpha-node3 kernel: [11840.400615]
> [<ffffffffa01fbb8b>] ? xfs_mountfs+0x36b/0x680 [xfs]
> Oct  9 18:26:52 mcluster-alpha-node3 kernel: [11840.400640]
> [<ffffffffa02137b4>] ? xfs_fs_fill_super+0x234/0x360 [xfs]
> Oct  9 18:26:52 mcluster-alpha-node3 kernel: [11840.400647]
> [<ffffffff811ed4ca>] ? disk_name+0xba/0xc0
> Oct  9 18:26:52 mcluster-alpha-node3 kernel: [11840.400653]
> [<ffffffff8117e54e>] ? get_sb_bdev+0x18e/0x1d0
> Oct  9 18:26:52 mcluster-alpha-node3 kernel: [11840.400675]
> [<ffffffffa0213580>] ? xfs_fs_fill_super+0x0/0x360 [xfs]
> Oct  9 18:26:52 mcluster-alpha-node3 kernel: [11840.400698]
> [<ffffffffa02113a8>] ? xfs_fs_get_sb+0x18/0x20 [xfs]
> Oct  9 18:26:52 mcluster-alpha-node3 kernel: [11840.400703]
> [<ffffffff8117dfdb>] ? vfs_kern_mount+0x7b/0x1b0
> Oct  9 18:26:52 mcluster-alpha-node3 kernel: [11840.400707]
> [<ffffffff8117e182>] ? do_kern_mount+0x52/0x130
> Oct  9 18:26:52 mcluster-alpha-node3 kernel: [11840.400713]
> [<ffffffff8119c852>] ? do_mount+0x2d2/0x8d0
> Oct  9 18:26:52 mcluster-alpha-node3 kernel: [11840.400719]
> [<ffffffff81136014>] ? strndup_user+0x64/0xc0
> Oct  9 18:26:52 mcluster-alpha-node3 kernel: [11840.400723]
> [<ffffffff8119cee0>] ? sys_mount+0x90/0xe0
> Oct  9 18:26:52 mcluster-alpha-node3 kernel: [11840.400730]
> [<ffffffff8100b0f2>] ? system_call_fastpath+0x16/0x1b
> Oct  9 18:26:52 mcluster-alpha-node3 kernel: [11840.400734] XFS
> (dm-3): failed to locate log tail
> Oct  9 18:26:52 mcluster-alpha-node3 kernel: [11840.400737] XFS
> (dm-3): log mount/recovery failed: error 117
> Oct  9 18:26:52 mcluster-alpha-node3 kernel: [11840.400816] XFS
> (dm-3): log mount failed
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

-- 
Carlos

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [xfs-masters] xfs corruption - structure need cleaning
  2013-10-15 13:45 xfs corruption - structure need cleaning Roy Zhang
  2013-10-15 13:54 ` Carlos Maiolino
@ 2013-10-15 20:43 ` Dave Chinner
       [not found]   ` <CAMg3XqptephuOPEJ-iiQF+Bs5Kp7MO=D-siw6fbNMG4=2CQGKg@mail.gmail.com>
  1 sibling, 1 reply; 4+ messages in thread
From: Dave Chinner @ 2013-10-15 20:43 UTC (permalink / raw)
  To: Roy Zhang; +Cc: xfs-masters, xfs

On Tue, Oct 15, 2013 at 09:45:27PM +0800, Roy Zhang wrote:
> Hi,
> I met a problem that cannot mount xfs, log as below.
> I got know xfs_repair -L will fix this situation, I want to know how
> and why cause it, it's a bug in xfs or hdd? Is there any patch to fix
> it?
> 
> Thanks
> Roy
> 
> Oct  9 18:26:52 mcluster-alpha-node3 kernel: [11840.232985] XFS
> (dm-3): Mounting Filesystem
> Oct  9 18:26:52 mcluster-alpha-node3 kernel: [11840.327773] XFS
> (dm-3): Internal error xlog_clear_stale_blocks(2) at line 1353 of file
> fs/xfs/xfs_log_recover.c.  Caller 0xffffffffa01f894d

The head and tail of the log are confused - different cycle numbers
but the tail is behind the head. That implies that there are 3 cycle
numbers visible in the log, when here should only be 2, which would
mean that some log write did not make it to disk correctly.

You'll need to provide a copy of the log (xfs_logprint can get that
for you) and the information about your system described here:

http://xfs.org/index.php/XFS_FAQ#Q:_What_information_should_I_include_when_reporting_a_problem.3F

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [xfs-masters] xfs corruption - structure need cleaning
       [not found]   ` <CAMg3XqptephuOPEJ-iiQF+Bs5Kp7MO=D-siw6fbNMG4=2CQGKg@mail.gmail.com>
@ 2013-10-16  1:46     ` Dave Chinner
  0 siblings, 0 replies; 4+ messages in thread
From: Dave Chinner @ 2013-10-16  1:46 UTC (permalink / raw)
  To: Roy Zhang; +Cc: xfs-masters, xfs

On Wed, Oct 16, 2013 at 08:12:00AM +0800, Roy Zhang wrote:
> Hi Dave,
> I performance a ssd and hdd by flashcache. The info as below.
> kernel version 2.6.32.220

So you're using out of tree modules in the IO path, on a custom
Centos 6.3 kernel and you are getting random hangs waiting for IO
completion.

FWIW, 15,000 lines of log files is not the information I asked for,
but this:

[1535047.183083] MEMBlaze Hardware IO Request Irresponsible

indicates that you are using some kind of PCIe flash hardware from a
chinese startup that doesn't have in-kernel drivers or english
documentation.  There's no way we can really help you diagnose IO
stack problems given these conditions.

FWIW, your logs indicate that something is going wrong in your IO
stack, not with XFS.  XFS is triggering the hung task timer waiting
for IO completion, and only after many, many reboots as a result of
these hangs you see a log corruption when trying to mount the
filesystem.

So - look to flashcache or your hardware as the source of your
problem...

Cheers,

Dave.

-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2013-10-16  1:47 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-10-15 13:45 xfs corruption - structure need cleaning Roy Zhang
2013-10-15 13:54 ` Carlos Maiolino
2013-10-15 20:43 ` [xfs-masters] " Dave Chinner
     [not found]   ` <CAMg3XqptephuOPEJ-iiQF+Bs5Kp7MO=D-siw6fbNMG4=2CQGKg@mail.gmail.com>
2013-10-16  1:46     ` Dave Chinner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox