public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
* xfs_repair breaks with assertion
@ 2013-04-11  5:25 Victor K
  2013-04-11  6:25 ` Dave Chinner
  0 siblings, 1 reply; 5+ messages in thread
From: Victor K @ 2013-04-11  5:25 UTC (permalink / raw)
  To: xfs


[-- Attachment #1.1: Type: text/plain, Size: 4380 bytes --]

Hello,

I'm trying to repair an XFS file system on our mdadm raid6 array after
sudden system failure.
Running xfs_repair /dev/md1 the first time resulted in suggestion to
mount/unmount to replay log, but mounting would not work. After running
xfs_repair -v -L -P /dev/md1 this happens:
(lots of output on stderr, moving to Phase 3, then more output - not sure
if it is relevant, the log file is ~170Mb in size), then stops and prints
the only line on stdout:

xfs_repair: dinode.c:768: process_bmbt_reclist_int: Assertion `i <
*numrecs' failed.
Aborted

After inserting a printf before the assert, I get the following:

i = 0, *numrecs = -570425343  for printf( "%d, %d")
or
i= 0, *numrecs = 3724541953  for printf("%ld, %ld) - makes me wonder if
it's signed/unsigned int related

both trips on if(i>*numrecs) conditional

The filesystem size is 10Tb (7x2Tb disks in raid6) and it is about 8Tb full.

xfsprogs version is 3.1.10 compiled from git source this morning.

The system is Ubuntu 12.04.2 with kernel version 3.8.5.

When I try to run xfs_metadump, it crashes:
*** glibc detected *** xfs_db: double free or corruption (!prev):
0x0000000000da800
0 ***
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x7eb96)[0x7f3d9501cb96]
xfs_db[0x417383]
xfs_db[0x41a941]
xfs_db[0x419030]
xfs_db[0x41a85c]
xfs_db[0x419030]
xfs_db[0x41b89e]
xfs_db[0x4050c0]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed)[0x7f3d94fbf76d]
xfs_db[0x4051c5]
======= Memory map: ========
00400000-0046e000 r-xp 00000000 08:81 1837319
 /usr/sbin/
xfs_db
0066d000-0066e000 r--p 0006d000 08:81 1837319
 /usr/sbin/
xfs_db
0066e000-0066f000 rw-p 0006e000 08:81 1837319
 /usr/sbin/
xfs_db
0066f000-00682000 rw-p 00000000 00:00 0
00d63000-00dc4000 rw-p 00000000 00:00 0
 [heap]
7f3d94d88000-7f3d94d9d000 r-xp 00000000 08:81 2363486
 /lib/x86_64-linux-gnu/libgcc_s.so.1
7f3d94d9d000-7f3d94f9c000 ---p 00015000 08:81 2363486
 /lib/x86_64-linux-gnu/libgcc_s.so.1
7f3d94f9c000-7f3d94f9d000 r--p 00014000 08:81 2363486
 /lib/x86_64-linux-gnu/libgcc_s.so.1
7f3d94f9d000-7f3d94f9e000 rw-p 00015000 08:81 2363486
 /lib/x86_64-linux-gnu/libgcc_s.so.1
7f3d94f9e000-7f3d95153000 r-xp 00000000 08:81 2423054
 /lib/x86_64-linux-gnu/libc-2.15.so
7f3d95153000-7f3d95352000 ---p 001b5000 08:81 2423054
 /lib/x86_64-linux-gnu/libc-2.15.so
7f3d95352000-7f3d95356000 r--p 001b4000 08:81 2423054
 /lib/x86_64-linux-gnu/libc-2.15.so
7f3d95356000-7f3d95358000 rw-p 001b8000 08:81 2423054
 /lib/x86_64-linux-gnu/libc-2.15.so
7f3d95358000-7f3d9535d000 rw-p 00000000 00:00 0
7f3d9535d000-7f3d95375000 r-xp 00000000 08:81 2423056
 /lib/x86_64-linux-gnu/libpthread-2.15.so
7f3d95375000-7f3d95574000 ---p 00018000 08:81 2423056
 /lib/x86_64-linux-gnu/libpthread-2.15.so
7f3d95574000-7f3d95575000 r--p 00017000 08:81 2423056
 /lib/x86_64-linux-gnu/libpthread-2.15.so
7f3d95575000-7f3d95576000 rw-p 00018000 08:81 2423056
 /lib/x86_64-linux-gnu/libpthread-2.15.so
7f3d95576000-7f3d9557a000 rw-p 00000000 00:00 0
7f3d9557a000-7f3d9557e000 r-xp 00000000 08:81 2359972
 /lib/x86_64-linux-gnu/libuuid.so.1.3.0
7f3d9557e000-7f3d9577d000 ---p 00004000 08:81 2359972
 /lib/x86_64-linux-gnu/libuuid.so.1.3.0
7f3d9577d000-7f3d9577e000 r--p 00003000 08:81 2359972
 /lib/x86_64-linux-gnu/libuuid.so.1.3.0
7f3d9577e000-7f3d9577f000 rw-p 00004000 08:81 2359972
 /lib/x86_64-linux-gnu/libuuid.so.1.3.0
7f3d9577f000-7f3d957a1000 r-xp 00000000 08:81 2423068
 /lib/x86_64-linux-gnu/ld-2.15.so
7f3d957ba000-7f3d957fb000 rw-p 00000000 00:00 0
7f3d957fb000-7f3d95985000 r--p 00000000 08:81 1967430
 /usr/lib/locale/locale-archive
7f3d95985000-7f3d95989000 rw-p 00000000 00:00 0
7f3d9599d000-7f3d959a1000 rw-p 00000000 00:00 0
7f3d959a1000-7f3d959a2000 r--p 00022000 08:81 2423068
 /lib/x86_64-linux-gnu/ld-2.15.so
7f3d959a2000-7f3d959a4000 rw-p 00023000 08:81 2423068
 /lib/x86_64-linux-gnu/ld-2.15.so
7fffa80d8000-7fffa80f9000 rw-p 00000000 00:00 0
 [stack]
7fffa8170000-7fffa8171000 r-xp 00000000 00:00 0
 [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0
 [vsyscall]
Aborted

It produces a file of size 4325376 bytes - not sure if it's right, as I
read about sizes of 80Mb for the dump file.

If I try now (after running xfs_repair -L) to mount the fs read-only, it
mounts but says some directories have structures that need cleaning, so the
dirs are inaccessible.

Any suggestion on how to possibly fix this?

Thanks!
Victor

[-- Attachment #1.2: Type: text/html, Size: 6849 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: xfs_repair breaks with assertion
  2013-04-11  5:25 xfs_repair breaks with assertion Victor K
@ 2013-04-11  6:25 ` Dave Chinner
  2013-04-11  6:34   ` Victor K
  0 siblings, 1 reply; 5+ messages in thread
From: Dave Chinner @ 2013-04-11  6:25 UTC (permalink / raw)
  To: Victor K; +Cc: xfs

On Thu, Apr 11, 2013 at 01:25:24PM +0800, Victor K wrote:
> Hello,
> 
> I'm trying to repair an XFS file system on our mdadm raid6 array after
> sudden system failure.
> Running xfs_repair /dev/md1 the first time resulted in suggestion to
> mount/unmount to replay log, but mounting would not work. After running
> xfs_repair -v -L -P /dev/md1 this happens:
> (lots of output on stderr, moving to Phase 3, then more output - not sure
> if it is relevant, the log file is ~170Mb in size), then stops and prints
> the only line on stdout:

Oh dear. A log file that big indicates that something *bad* has
happened to the array. i.e that it has most likely been put back
together wrong.

Before going any further with xfs_repair, please verify that the
array has been put back together correctly....

> xfs_repair: dinode.c:768: process_bmbt_reclist_int: Assertion `i <
> *numrecs' failed.
> Aborted
> 
> After inserting a printf before the assert, I get the following:
> 
> i = 0, *numrecs = -570425343  for printf( "%d, %d")
> or
> i= 0, *numrecs = 3724541953  for printf("%ld, %ld) - makes me wonder if
> it's signed/unsigned int related

numrecs is way out of the normal range, so that's probably what is
triggering it.

i.e this in process_exinode():

	numrecs = XFS_DFORK_NEXTENTS(dip, whichfork);

is where the bad number is coming from, and that implies a corrupted
inode. it's a __be32 on disk, the kernel considers it a xfs_extnum_t
in memory which is a int32_t because:

#define NULLEXTNUM      ((xfs_extnum_t)-1)

So, negative numbers on disk are invalid.
....

The patch below should fix the assert failure.

> If I try now (after running xfs_repair -L) to mount the fs read-only, it
> mounts but says some directories have structures that need cleaning, so the
> dirs are inaccessible.
> 
> Any suggestion on how to possibly fix this?

I suspect you've damaged it beyond repair now.

If the array was put back together incorrectly in the first place
(which is likely given the damage being reported), then
you've made the problem a whole lot worse by writing to it in an
attempt to repair it.

I'd suggest that you make sure the array is correctly
repaired/ordered/reocvered before doing anything else, then
running xfs_repair on what is left and hoping for the best. Even after
repair is finished, you'll need to go through all the data with a
fine toothed comb to work out what has been lost, corrupted or
overwritten with zeros or other stuff.

I suspect you'll be reaching for the backup tapes long before you
get that far, though...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com


xfs_repair: validate on-disk extent count better

From: Dave Chinner <dchinner@redhat.com>

When scanning a btree format inode, we trust the extent count to be
in range.  However, values of the range 2^31 <= cnt < 2^32 are
invalid and can cause problems with signed range checks. This
results in assert failures which validating the extent count such
as:

xfs_repair: dinode.c:768: process_bmbt_reclist_int: Assertion `i < *numrecs' failed.

Validate the extent count is at least within the positive range of a
singed 32 bit integer before using it.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 repair/dinode.c |   25 +++++++++++++++++++++++--
 1 file changed, 23 insertions(+), 2 deletions(-)

diff --git a/repair/dinode.c b/repair/dinode.c
index 5a2da39..239bb7b 100644
--- a/repair/dinode.c
+++ b/repair/dinode.c
@@ -1293,7 +1293,7 @@ process_exinode(
 	xfs_bmbt_rec_t		*rp;
 	xfs_dfiloff_t		first_key;
 	xfs_dfiloff_t		last_key;
-	int			numrecs;
+	int32_t			numrecs;
 	int			ret;
 
 	lino = XFS_AGINO_TO_INO(mp, agno, ino);
@@ -1302,6 +1302,15 @@ process_exinode(
 	numrecs = XFS_DFORK_NEXTENTS(dip, whichfork);
 
 	/*
+	 * We've already decided on the maximum number of extents on the inode,
+	 * and numrecs may be corrupt. Hence make sure we only allow numrecs to
+	 * be in the range of valid on-disk numbers, which is:
+	 *	0 < numrecs < 2^31 - 1
+	 */
+	if (numrecs < 0)
+		numrecs = *nex;
+
+	/*
 	 * XXX - if we were going to fix up the btree record,
 	 * we'd do it right here.  For now, if there's a problem,
 	 * we'll bail out and presumably clear the inode.
@@ -2038,11 +2047,23 @@ process_inode_data_fork(
 {
 	xfs_ino_t	lino = XFS_AGINO_TO_INO(mp, agno, ino);
 	int		err = 0;
+	int		nex;
+
+	/*
+	 * extent count on disk is only valid for positive values. The kernel
+	 * uses negative values in memory. hence if we see negative numbers
+	 * here, trash it!
+	 */
+	nex = be32_to_cpu(dino->di_nextents);
+	if (nex < 0)
+		*nextents = 1;
+	else
+		*nextents = nex;
 
-	*nextents = be32_to_cpu(dino->di_nextents);
 	if (*nextents > be64_to_cpu(dino->di_nblocks))
 		*nextents = 1;
 
+
 	if (dino->di_format != XFS_DINODE_FMT_LOCAL && type != XR_INO_RTDATA)
 		*dblkmap = blkmap_alloc(*nextents, XFS_DATA_FORK);
 	*nextents = 0;

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: xfs_repair breaks with assertion
  2013-04-11  6:25 ` Dave Chinner
@ 2013-04-11  6:34   ` Victor K
  2013-04-11  7:02     ` Dave Chinner
  2013-04-11  9:55     ` Stan Hoeppner
  0 siblings, 2 replies; 5+ messages in thread
From: Victor K @ 2013-04-11  6:34 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs


[-- Attachment #1.1: Type: text/plain, Size: 5954 bytes --]

> Running xfs_repair /dev/md1 the first time resulted in suggestion to

> > mount/unmount to replay log, but mounting would not work. After running
> > xfs_repair -v -L -P /dev/md1 this happens:
> > (lots of output on stderr, moving to Phase 3, then more output - not sure
> > if it is relevant, the log file is ~170Mb in size), then stops and prints
> > the only line on stdout:
>
> Oh dear. A log file that big indicates that something *bad* has
> happened to the array. i.e that it has most likely been put back
> together wrong.
>
> Before going any further with xfs_repair, please verify that the
> array has been put back together correctly....
>
>
The raid array did not suffer, at least, not according to mdadm; it is now
happily recovering the one disk that officially failed, but the whole thing
assembled without a problem
There was a similar crash several weeks ago on this same array, but had
ext4 system back then.
I was able to save some of the latest stuff, and decided to move to xfs as
something more reliable.
I suspect now I should also had replaced the disk controller then.


> > xfs_repair: dinode.c:768: process_bmbt_reclist_int: Assertion `i <
> > *numrecs' failed.
> > Aborted
> >
> > After inserting a printf before the assert, I get the following:
> >
> > i = 0, *numrecs = -570425343  for printf( "%d, %d")
> > or
> > i= 0, *numrecs = 3724541953  for printf("%ld, %ld) - makes me wonder if
> > it's signed/unsigned int related
>
> numrecs is way out of the normal range, so that's probably what is
> triggering it.
>
> i.e this in process_exinode():
>
>         numrecs = XFS_DFORK_NEXTENTS(dip, whichfork);
>
> is where the bad number is coming from, and that implies a corrupted
> inode. it's a __be32 on disk, the kernel considers it a xfs_extnum_t
> in memory which is a int32_t because:
>
> #define NULLEXTNUM      ((xfs_extnum_t)-1)
>
> So, negative numbers on disk are invalid.
> ....
>
> The patch below should fix the assert failure.
>
>
I'll try it - don't really have other options at the moment


> > If I try now (after running xfs_repair -L) to mount the fs read-only, it
> > mounts but says some directories have structures that need cleaning, so
> the
> > dirs are inaccessible.
> >
> > Any suggestion on how to possibly fix this?
>
> I suspect you've damaged it beyond repair now.
>
> If the array was put back together incorrectly in the first place
> (which is likely given the damage being reported), then
> you've made the problem a whole lot worse by writing to it in an
> attempt to repair it.
>
> I'd suggest that you make sure the array is correctly
> repaired/ordered/reocvered before doing anything else, then
> running xfs_repair on what is left and hoping for the best. Even after
> repair is finished, you'll need to go through all the data with a
> fine toothed comb to work out what has been lost, corrupted or
> overwritten with zeros or other stuff.
>
> I suspect you'll be reaching for the backup tapes long before you
> get that far, though...
>


Well, we'll see how it goes.

Thanks for the patch and the quick reply!

Sincerely,
Victor


>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com
>
>
> xfs_repair: validate on-disk extent count better
>
> From: Dave Chinner <dchinner@redhat.com>
>
> When scanning a btree format inode, we trust the extent count to be
> in range.  However, values of the range 2^31 <= cnt < 2^32 are
> invalid and can cause problems with signed range checks. This
> results in assert failures which validating the extent count such
> as:
>
> xfs_repair: dinode.c:768: process_bmbt_reclist_int: Assertion `i <
> *numrecs' failed.
>
> Validate the extent count is at least within the positive range of a
> singed 32 bit integer before using it.
>
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  repair/dinode.c |   25 +++++++++++++++++++++++--
>  1 file changed, 23 insertions(+), 2 deletions(-)
>
> diff --git a/repair/dinode.c b/repair/dinode.c
> index 5a2da39..239bb7b 100644
> --- a/repair/dinode.c
> +++ b/repair/dinode.c
> @@ -1293,7 +1293,7 @@ process_exinode(
>         xfs_bmbt_rec_t          *rp;
>         xfs_dfiloff_t           first_key;
>         xfs_dfiloff_t           last_key;
> -       int                     numrecs;
> +       int32_t                 numrecs;
>         int                     ret;
>
>         lino = XFS_AGINO_TO_INO(mp, agno, ino);
> @@ -1302,6 +1302,15 @@ process_exinode(
>         numrecs = XFS_DFORK_NEXTENTS(dip, whichfork);
>
>         /*
> +        * We've already decided on the maximum number of extents on the
> inode,
> +        * and numrecs may be corrupt. Hence make sure we only allow
> numrecs to
> +        * be in the range of valid on-disk numbers, which is:
> +        *      0 < numrecs < 2^31 - 1
> +        */
> +       if (numrecs < 0)
> +               numrecs = *nex;
> +
> +       /*
>          * XXX - if we were going to fix up the btree record,
>          * we'd do it right here.  For now, if there's a problem,
>          * we'll bail out and presumably clear the inode.
> @@ -2038,11 +2047,23 @@ process_inode_data_fork(
>  {
>         xfs_ino_t       lino = XFS_AGINO_TO_INO(mp, agno, ino);
>         int             err = 0;
> +       int             nex;
> +
> +       /*
> +        * extent count on disk is only valid for positive values. The
> kernel
> +        * uses negative values in memory. hence if we see negative numbers
> +        * here, trash it!
> +        */
> +       nex = be32_to_cpu(dino->di_nextents);
> +       if (nex < 0)
> +               *nextents = 1;
> +       else
> +               *nextents = nex;
>
> -       *nextents = be32_to_cpu(dino->di_nextents);
>         if (*nextents > be64_to_cpu(dino->di_nblocks))
>                 *nextents = 1;
>
> +
>         if (dino->di_format != XFS_DINODE_FMT_LOCAL && type !=
> XR_INO_RTDATA)
>                 *dblkmap = blkmap_alloc(*nextents, XFS_DATA_FORK);
>         *nextents = 0;
>

[-- Attachment #1.2: Type: text/html, Size: 7996 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: xfs_repair breaks with assertion
  2013-04-11  6:34   ` Victor K
@ 2013-04-11  7:02     ` Dave Chinner
  2013-04-11  9:55     ` Stan Hoeppner
  1 sibling, 0 replies; 5+ messages in thread
From: Dave Chinner @ 2013-04-11  7:02 UTC (permalink / raw)
  To: Victor K; +Cc: xfs

On Thu, Apr 11, 2013 at 02:34:32PM +0800, Victor K wrote:
> > Running xfs_repair /dev/md1 the first time resulted in suggestion to
> 
> > > mount/unmount to replay log, but mounting would not work. After running
> > > xfs_repair -v -L -P /dev/md1 this happens:
> > > (lots of output on stderr, moving to Phase 3, then more output - not sure
> > > if it is relevant, the log file is ~170Mb in size), then stops and prints
> > > the only line on stdout:
> >
> > Oh dear. A log file that big indicates that something *bad* has
> > happened to the array. i.e that it has most likely been put back
> > together wrong.
> >
> > Before going any further with xfs_repair, please verify that the
> > array has been put back together correctly....
> >
> >
> The raid array did not suffer, at least, not according to mdadm; it is now
> happily recovering the one disk that officially failed, but the whole thing
> assembled without a problem

Yeah, we see this often enough that all I can say is this: don't
trust what mdadm is telling you. Validate it by hand.  Massive
corruption does not occur when everything is put back together
correctly.

> There was a similar crash several weeks ago on this same array, but had
> ext4 system back then.
> I was able to save some of the latest stuff, and decided to move to xfs as
> something more reliable.

If the storage below the filesystem is unreliable, then changing
filesystems won't magically fix the problem.

> I suspect now I should also had replaced the disk controller then.

Well, that depends on whether it is the problem or not. if you are
not using hardware raid, then disk controller problems rarely result
in massive corruption of filesystems. A busted block here or there,
but they generally do not cause entire disks to suddenly becoe
corrupted.

I'd still be looking to a RAID reassembly problem than a filesystem
or a storage hardware issue...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: xfs_repair breaks with assertion
  2013-04-11  6:34   ` Victor K
  2013-04-11  7:02     ` Dave Chinner
@ 2013-04-11  9:55     ` Stan Hoeppner
  1 sibling, 0 replies; 5+ messages in thread
From: Stan Hoeppner @ 2013-04-11  9:55 UTC (permalink / raw)
  To: xfs

On 4/11/2013 1:34 AM, Victor K wrote:

> The raid array did not suffer, at least, not according to mdadm; it is now
> happily recovering the one disk that officially failed, but the whole thing
> assembled without a problem
> There was a similar crash several weeks ago on this same array, but had
> ext4 system back then.
> I was able to save some of the latest stuff, and decided to move to xfs as
> something more reliable.
> I suspect now I should also had replaced the disk controller then.

Rebuilds are *supposed* to be transparent to the filesystem but this is
not always the case.  Sometimes due to bugs.  In fact we just recently
saw an LVM bug wherein a pvmove operation was not transparent, and hosed
up an XFS.  This is but one of many reasons I prefer hardware based RAID
and volume management.  It isolates these functions and RAID memory
structures from the kernel, and thus prevents such bugs from causing
problems.  This may/not be the source of your apparent XFS corruption.
We don't have enough (log) data to ascertain the cause at this point.

Running repair on an 8/10TB filesystem while md is rebuilding the
underlying RAID6 array isn't something I'd put a lot of trust in.  Wait
until the rebuild is finished and then run a non-destructive repair.
Compare the results to the previous repair.

-- 
Stan

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2013-04-11  9:55 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-04-11  5:25 xfs_repair breaks with assertion Victor K
2013-04-11  6:25 ` Dave Chinner
2013-04-11  6:34   ` Victor K
2013-04-11  7:02     ` Dave Chinner
2013-04-11  9:55     ` Stan Hoeppner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox