public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
* Segmentation fault during xfs_repair
@ 2009-06-05 22:22 Richard Kolkovich
  2009-06-06  2:45 ` Eric Sandeen
  2009-06-06  4:44 ` Eric Sandeen
  0 siblings, 2 replies; 6+ messages in thread
From: Richard Kolkovich @ 2009-06-05 22:22 UTC (permalink / raw)
  To: xfs

We have a corrupted XFS partition on a storage server.  Attempting to run xfs_repair the first time yielded the message about a corrupt log file, so I have run xfs_repair with -L to clear that.  Now, xfs_repair segfaults in Phase 3.  I have tried -P and a huge -m to no avail.  It always seems to segfault at the same point:

bad directory block magic # 0 in block 11 for directory inode 341521797
corrupt block 11 in directory inode 341521797
        will junk block
Segmentation fault (core dumped)

Here is the backtrace:

(gdb) bt
#0  traverse_int_dir2block (mp=0x7fff4243c1d0, da_cursor=0x7fff4243bca0, rbno=0x7fff4243bd98)
    at dir2.c:358
#1  0x000000000041650e in process_node_dir2 () at dir2.c:1940
#2  process_leaf_node_dir2 (mp=0x7fff4243c1d0, ino=341521797, dip=0x27131600, ino_discovery=1, 
    dirname=0x46bdcd "", parent=0x7fff4243c080, blkmap=0x7f8828b5c060, dot=0x7fff4243be6c, 
    dotdot=0x7fff4243be68, repair=0x7fff4243be64, isnode=1) at dir2.c:2033
#3  0x00000000004182cc in process_dir2 (mp=0x7fff4243c1d0, ino=341521797, dip=0x27131600, 
    ino_discovery=1, dino_dirty=0x7fff4243c090, dirname=0x46bdcd "", parent=0x7fff4243c080, 
    blkmap=0x7f8828b5c060) at dir2.c:2086
#4  0x000000000040f9dc in process_dinode_int (mp=0x7fff4243c1d0, dino=0x27131600, agno=5, 
    ino=5977477, was_free=0, dirty=0x7fff4243c090, used=0x7fff4243c094, verify_mode=0, 
    uncertain=0, ino_discovery=1, check_dups=0, extra_attr_check=1, isa_dir=0x7fff4243c08c, 
    parent=0x7fff4243c080) at dinode.c:2668
#5  0x000000000040fbae in process_dinode (mp=0x7fff4254c438, dino=0x7fff4254c418, agno=939524166, 
    ino=5888, was_free=46501, dirty=0x3, used=0x7fff4243c094, ino_discovery=1, check_dups=0, 
    extra_attr_check=1, isa_dir=0x7fff4243c08c, parent=0x7fff4243c080) at dinode.c:2779
#6  0x00000000004088f6 in process_inode_chunk (mp=0x7fff4243c1d0, agno=5, 
    num_inos=<value optimized out>, first_irec=0x2198f60, ino_discovery=1, check_dups=0, 
    extra_attr_check=1, bogus=0x7fff4243c114) at dino_chunks.c:778
#7  0x0000000000408edd in process_aginodes (mp=0x7fff4243c1d0, pf_args=0x7f88284d97b0, agno=5, 
    ino_discovery=1, check_dups=0, extra_attr_check=1) at dino_chunks.c:1024
#8  0x000000000041bfdf in process_ag_func (wq=0x2003730, agno=5, arg=0x7f88284d97b0)
    at phase3.c:161
#9  0x000000000041c79b in process_ags () at phase3.c:200
#10 phase3 (mp=0x7fff4243c1d0) at phase3.c:239
#11 0x0000000000432435 in main (argc=<value optimized out>, argv=<value optimized out>)
    at xfs_repair.c:719

I can provide the full core file, if need be (956M).  The xfs_metadump can be found at:

http://files.intrameta.com/metadump.gz (735M)

Any suggestions/ideas on how to proceed are welcome.  Please Reply-All, as I'm not subscribed to the ML.

Thanks,

-- 

Richard Kolkovich
IntraMeta Corporation
richard@intrameta.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Segmentation fault during xfs_repair
  2009-06-05 22:22 Segmentation fault during xfs_repair Richard Kolkovich
@ 2009-06-06  2:45 ` Eric Sandeen
  2009-06-06  3:14   ` Richard Kolkovich
  2009-06-06  4:44 ` Eric Sandeen
  1 sibling, 1 reply; 6+ messages in thread
From: Eric Sandeen @ 2009-06-06  2:45 UTC (permalink / raw)
  To: Richard Kolkovich; +Cc: xfs

Richard Kolkovich wrote:
> We have a corrupted XFS partition on a storage server.  Attempting to run xfs_repair the first time yielded the message about a corrupt log file, so I have run xfs_repair with -L to clear that.  Now, xfs_repair segfaults in Phase 3.  I have tried -P and a huge -m to no avail.  It always seems to segfault at the same point:
> 
> bad directory block magic # 0 in block 11 for directory inode 341521797
> corrupt block 11 in directory inode 341521797
>         will junk block
> Segmentation fault (core dumped)

For starters, which xfsprogs version.... if not latest, try latest... if
latest, I'll grab that metadump image and see if I can reproduce it.

-Eric

> Here is the backtrace:
> 
> (gdb) bt
> #0  traverse_int_dir2block (mp=0x7fff4243c1d0, da_cursor=0x7fff4243bca0, rbno=0x7fff4243bd98)
>     at dir2.c:358
> #1  0x000000000041650e in process_node_dir2 () at dir2.c:1940
> #2  process_leaf_node_dir2 (mp=0x7fff4243c1d0, ino=341521797, dip=0x27131600, ino_discovery=1, 
>     dirname=0x46bdcd "", parent=0x7fff4243c080, blkmap=0x7f8828b5c060, dot=0x7fff4243be6c, 
>     dotdot=0x7fff4243be68, repair=0x7fff4243be64, isnode=1) at dir2.c:2033
> #3  0x00000000004182cc in process_dir2 (mp=0x7fff4243c1d0, ino=341521797, dip=0x27131600, 
>     ino_discovery=1, dino_dirty=0x7fff4243c090, dirname=0x46bdcd "", parent=0x7fff4243c080, 
>     blkmap=0x7f8828b5c060) at dir2.c:2086
> #4  0x000000000040f9dc in process_dinode_int (mp=0x7fff4243c1d0, dino=0x27131600, agno=5, 
>     ino=5977477, was_free=0, dirty=0x7fff4243c090, used=0x7fff4243c094, verify_mode=0, 
>     uncertain=0, ino_discovery=1, check_dups=0, extra_attr_check=1, isa_dir=0x7fff4243c08c, 
>     parent=0x7fff4243c080) at dinode.c:2668
> #5  0x000000000040fbae in process_dinode (mp=0x7fff4254c438, dino=0x7fff4254c418, agno=939524166, 
>     ino=5888, was_free=46501, dirty=0x3, used=0x7fff4243c094, ino_discovery=1, check_dups=0, 
>     extra_attr_check=1, isa_dir=0x7fff4243c08c, parent=0x7fff4243c080) at dinode.c:2779
> #6  0x00000000004088f6 in process_inode_chunk (mp=0x7fff4243c1d0, agno=5, 
>     num_inos=<value optimized out>, first_irec=0x2198f60, ino_discovery=1, check_dups=0, 
>     extra_attr_check=1, bogus=0x7fff4243c114) at dino_chunks.c:778
> #7  0x0000000000408edd in process_aginodes (mp=0x7fff4243c1d0, pf_args=0x7f88284d97b0, agno=5, 
>     ino_discovery=1, check_dups=0, extra_attr_check=1) at dino_chunks.c:1024
> #8  0x000000000041bfdf in process_ag_func (wq=0x2003730, agno=5, arg=0x7f88284d97b0)
>     at phase3.c:161
> #9  0x000000000041c79b in process_ags () at phase3.c:200
> #10 phase3 (mp=0x7fff4243c1d0) at phase3.c:239
> #11 0x0000000000432435 in main (argc=<value optimized out>, argv=<value optimized out>)
>     at xfs_repair.c:719
> 
> I can provide the full core file, if need be (956M).  The xfs_metadump can be found at:
> 
> http://files.intrameta.com/metadump.gz (735M)
> 
> Any suggestions/ideas on how to proceed are welcome.  Please Reply-All, as I'm not subscribed to the ML.
> 
> Thanks,
> 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Segmentation fault during xfs_repair
  2009-06-06  2:45 ` Eric Sandeen
@ 2009-06-06  3:14   ` Richard Kolkovich
  2009-06-06  3:25     ` Eric Sandeen
  0 siblings, 1 reply; 6+ messages in thread
From: Richard Kolkovich @ 2009-06-06  3:14 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: xfs@oss.sgi.com

On Fri, Jun 05, 2009 at 10:45:37PM -0400, Eric Sandeen wrote:
> Richard Kolkovich wrote:
> > We have a corrupted XFS partition on a storage server.  Attempting to run xfs_repair the first time yielded the message about a corrupt log file, so I have run xfs_repair with -L to clear that.  Now, xfs_repair segfaults in Phase 3.  I have tried -P and a huge -m to no avail.  It always seems to segfault at the same point:
> > 
> > bad directory block magic # 0 in block 11 for directory inode 341521797
> > corrupt block 11 in directory inode 341521797
> >         will junk block
> > Segmentation fault (core dumped)
> 
> For starters, which xfsprogs version.... if not latest, try latest... if
> latest, I'll grab that metadump image and see if I can reproduce it.
> 
> -Eric

Sorry - forgot to mention that.

Running on Fedora 11 (64bit).  Tried using 2.10.2 (from yum) and building from latest stable source (3.0.1).  Let me know if I should try a dev build.

Thanks,

-- 

Richard Kolkovich
IntraMeta Corporation
richard@intrameta.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Segmentation fault during xfs_repair
  2009-06-06  3:14   ` Richard Kolkovich
@ 2009-06-06  3:25     ` Eric Sandeen
  0 siblings, 0 replies; 6+ messages in thread
From: Eric Sandeen @ 2009-06-06  3:25 UTC (permalink / raw)
  To: Richard Kolkovich; +Cc: xfs@oss.sgi.com

Richard Kolkovich wrote:
> On Fri, Jun 05, 2009 at 10:45:37PM -0400, Eric Sandeen wrote:
>> Richard Kolkovich wrote:
>>> We have a corrupted XFS partition on a storage server.
>>> Attempting to run xfs_repair the first time yielded the message
>>> about a corrupt log file, so I have run xfs_repair with -L to
>>> clear that.  Now, xfs_repair segfaults in Phase 3.  I have tried
>>> -P and a huge -m to no avail.  It always seems to segfault at the
>>> same point:
>>> 
>>> bad directory block magic # 0 in block 11 for directory inode
>>> 341521797 corrupt block 11 in directory inode 341521797 will junk
>>> block Segmentation fault (core dumped)
>> For starters, which xfsprogs version.... if not latest, try
>> latest... if latest, I'll grab that metadump image and see if I can
>> reproduce it.
>> 
>> -Eric
> 
> Sorry - forgot to mention that.
> 
> Running on Fedora 11 (64bit).  Tried using 2.10.2 (from yum) and
> building from latest stable source (3.0.1).  Let me know if I should
> try a dev build.

(Hm, did I really leave F11 at 2.10.2?  I thought it was newer, but anyway)

No, I doubt anything else has fixed this since 3.0.1

I'll try pulling down that metadump image & see what I can see.

Feel free to file an xfsprogs bug with fedora, too, so the issue doesn't
get lost...

-Eric

> Thanks,
> 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Segmentation fault during xfs_repair
  2009-06-05 22:22 Segmentation fault during xfs_repair Richard Kolkovich
  2009-06-06  2:45 ` Eric Sandeen
@ 2009-06-06  4:44 ` Eric Sandeen
  2009-06-06  5:10   ` Eric Sandeen
  1 sibling, 1 reply; 6+ messages in thread
From: Eric Sandeen @ 2009-06-06  4:44 UTC (permalink / raw)
  To: Richard Kolkovich; +Cc: xfs

Richard Kolkovich wrote:
> We have a corrupted XFS partition on a storage server.  Attempting to run xfs_repair the first time yielded the message about a corrupt log file, so I have run xfs_repair with -L to clear that.  Now, xfs_repair segfaults in Phase 3.  I have tried -P and a huge -m to no avail.  It always seems to segfault at the same point:
> 
> bad directory block magic # 0 in block 11 for directory inode 341521797
> corrupt block 11 in directory inode 341521797
>         will junk block
> Segmentation fault (core dumped)

...

> I can provide the full core file, if need be (956M).  The xfs_metadump can be found at:
> 
> http://files.intrameta.com/metadump.gz (735M)
> 
> Any suggestions/ideas on how to proceed are welcome.  Please Reply-All, as I'm not subscribed to the ML.

Ok, on a -g (not -02) build:

Program terminated with signal 11, Segmentation fault.
#0  0x0000000000418d05 in traverse_int_dir2block (mp=0x7ffff4c4f150,
da_cursor=0x7ffff4c4eb30, rbno=0x7ffff4c4ebdc) at dir2.c:356
356			da_cursor->level[i].hashval =
(gdb) p i
$1 = 46501

i is set from

i = da_cursor->active = be16_to_cpu(node->hdr.level);

(gdb) p node->hdr.level // note this is big endian
$2 = 42421

that's a crazily deep btree, well beyond anything sane:

#define XFS_DA_NODE_MAXDEPTH    5       /* max depth of Btree */

So repair really should be checking for this before it goes off and
indexes it:

356			da_cursor->level[i].hashval =

because the cursor only has this much in the array:

        dir2_level_state_t      level[XFS_DA_NODE_MAXDEPTH];

I'll have to ponder what repair should do in this case ... and I'll see
if there's something we can do in xfs_db to just whack out this problem
and let repair continue for now.

-Eric

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Segmentation fault during xfs_repair
  2009-06-06  4:44 ` Eric Sandeen
@ 2009-06-06  5:10   ` Eric Sandeen
  0 siblings, 0 replies; 6+ messages in thread
From: Eric Sandeen @ 2009-06-06  5:10 UTC (permalink / raw)
  To: Richard Kolkovich; +Cc: xfs

Eric Sandeen wrote:

> I'll have to ponder what repair should do in this case ... and I'll see
> if there's something we can do in xfs_db to just whack out this problem
> and let repair continue for now.
>
> -Eric
>
>   
This should get you over that hump I think:


--- xfsprogs-3.0.1.orig/repair/dir2.c	2009-06-06 00:01:10.711081870 -0500
+++ xfsprogs-3.0.1/repair/dir2.c	2009-06-06 00:05:52.993365954 -0500
@@ -353,6 +353,14 @@
 			}
 		}
 
+		if (i >= XFS_DA_NODE_MAXDEPTH) {
+			do_warn(_("bad header depth for directory inode %llu\n"),
+				da_cursor->ino);
+			da_brelse(bp);
+			i = -1;
+			goto error_out;
+		}
+
 		da_cursor->level[i].hashval =
 					be32_to_cpu(node->btree[0].hashval);
 		da_cursor->level[i].bp = bp;


but I have to say, that is one fried filesystem you've got there....

-Eric

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2009-06-06  5:09 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-06-05 22:22 Segmentation fault during xfs_repair Richard Kolkovich
2009-06-06  2:45 ` Eric Sandeen
2009-06-06  3:14   ` Richard Kolkovich
2009-06-06  3:25     ` Eric Sandeen
2009-06-06  4:44 ` Eric Sandeen
2009-06-06  5:10   ` Eric Sandeen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox