* Segmentation fault during xfs_repair
@ 2009-06-05 22:22 Richard Kolkovich
2009-06-06 2:45 ` Eric Sandeen
2009-06-06 4:44 ` Eric Sandeen
0 siblings, 2 replies; 6+ messages in thread
From: Richard Kolkovich @ 2009-06-05 22:22 UTC (permalink / raw)
To: xfs
We have a corrupted XFS partition on a storage server. Attempting to run xfs_repair the first time yielded the message about a corrupt log file, so I have run xfs_repair with -L to clear that. Now, xfs_repair segfaults in Phase 3. I have tried -P and a huge -m to no avail. It always seems to segfault at the same point:
bad directory block magic # 0 in block 11 for directory inode 341521797
corrupt block 11 in directory inode 341521797
will junk block
Segmentation fault (core dumped)
Here is the backtrace:
(gdb) bt
#0 traverse_int_dir2block (mp=0x7fff4243c1d0, da_cursor=0x7fff4243bca0, rbno=0x7fff4243bd98)
at dir2.c:358
#1 0x000000000041650e in process_node_dir2 () at dir2.c:1940
#2 process_leaf_node_dir2 (mp=0x7fff4243c1d0, ino=341521797, dip=0x27131600, ino_discovery=1,
dirname=0x46bdcd "", parent=0x7fff4243c080, blkmap=0x7f8828b5c060, dot=0x7fff4243be6c,
dotdot=0x7fff4243be68, repair=0x7fff4243be64, isnode=1) at dir2.c:2033
#3 0x00000000004182cc in process_dir2 (mp=0x7fff4243c1d0, ino=341521797, dip=0x27131600,
ino_discovery=1, dino_dirty=0x7fff4243c090, dirname=0x46bdcd "", parent=0x7fff4243c080,
blkmap=0x7f8828b5c060) at dir2.c:2086
#4 0x000000000040f9dc in process_dinode_int (mp=0x7fff4243c1d0, dino=0x27131600, agno=5,
ino=5977477, was_free=0, dirty=0x7fff4243c090, used=0x7fff4243c094, verify_mode=0,
uncertain=0, ino_discovery=1, check_dups=0, extra_attr_check=1, isa_dir=0x7fff4243c08c,
parent=0x7fff4243c080) at dinode.c:2668
#5 0x000000000040fbae in process_dinode (mp=0x7fff4254c438, dino=0x7fff4254c418, agno=939524166,
ino=5888, was_free=46501, dirty=0x3, used=0x7fff4243c094, ino_discovery=1, check_dups=0,
extra_attr_check=1, isa_dir=0x7fff4243c08c, parent=0x7fff4243c080) at dinode.c:2779
#6 0x00000000004088f6 in process_inode_chunk (mp=0x7fff4243c1d0, agno=5,
num_inos=<value optimized out>, first_irec=0x2198f60, ino_discovery=1, check_dups=0,
extra_attr_check=1, bogus=0x7fff4243c114) at dino_chunks.c:778
#7 0x0000000000408edd in process_aginodes (mp=0x7fff4243c1d0, pf_args=0x7f88284d97b0, agno=5,
ino_discovery=1, check_dups=0, extra_attr_check=1) at dino_chunks.c:1024
#8 0x000000000041bfdf in process_ag_func (wq=0x2003730, agno=5, arg=0x7f88284d97b0)
at phase3.c:161
#9 0x000000000041c79b in process_ags () at phase3.c:200
#10 phase3 (mp=0x7fff4243c1d0) at phase3.c:239
#11 0x0000000000432435 in main (argc=<value optimized out>, argv=<value optimized out>)
at xfs_repair.c:719
I can provide the full core file, if need be (956M). The xfs_metadump can be found at:
http://files.intrameta.com/metadump.gz (735M)
Any suggestions/ideas on how to proceed are welcome. Please Reply-All, as I'm not subscribed to the ML.
Thanks,
--
Richard Kolkovich
IntraMeta Corporation
richard@intrameta.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: Segmentation fault during xfs_repair
2009-06-05 22:22 Segmentation fault during xfs_repair Richard Kolkovich
@ 2009-06-06 2:45 ` Eric Sandeen
2009-06-06 3:14 ` Richard Kolkovich
2009-06-06 4:44 ` Eric Sandeen
1 sibling, 1 reply; 6+ messages in thread
From: Eric Sandeen @ 2009-06-06 2:45 UTC (permalink / raw)
To: Richard Kolkovich; +Cc: xfs
Richard Kolkovich wrote:
> We have a corrupted XFS partition on a storage server. Attempting to run xfs_repair the first time yielded the message about a corrupt log file, so I have run xfs_repair with -L to clear that. Now, xfs_repair segfaults in Phase 3. I have tried -P and a huge -m to no avail. It always seems to segfault at the same point:
>
> bad directory block magic # 0 in block 11 for directory inode 341521797
> corrupt block 11 in directory inode 341521797
> will junk block
> Segmentation fault (core dumped)
For starters, which xfsprogs version.... if not latest, try latest... if
latest, I'll grab that metadump image and see if I can reproduce it.
-Eric
> Here is the backtrace:
>
> (gdb) bt
> #0 traverse_int_dir2block (mp=0x7fff4243c1d0, da_cursor=0x7fff4243bca0, rbno=0x7fff4243bd98)
> at dir2.c:358
> #1 0x000000000041650e in process_node_dir2 () at dir2.c:1940
> #2 process_leaf_node_dir2 (mp=0x7fff4243c1d0, ino=341521797, dip=0x27131600, ino_discovery=1,
> dirname=0x46bdcd "", parent=0x7fff4243c080, blkmap=0x7f8828b5c060, dot=0x7fff4243be6c,
> dotdot=0x7fff4243be68, repair=0x7fff4243be64, isnode=1) at dir2.c:2033
> #3 0x00000000004182cc in process_dir2 (mp=0x7fff4243c1d0, ino=341521797, dip=0x27131600,
> ino_discovery=1, dino_dirty=0x7fff4243c090, dirname=0x46bdcd "", parent=0x7fff4243c080,
> blkmap=0x7f8828b5c060) at dir2.c:2086
> #4 0x000000000040f9dc in process_dinode_int (mp=0x7fff4243c1d0, dino=0x27131600, agno=5,
> ino=5977477, was_free=0, dirty=0x7fff4243c090, used=0x7fff4243c094, verify_mode=0,
> uncertain=0, ino_discovery=1, check_dups=0, extra_attr_check=1, isa_dir=0x7fff4243c08c,
> parent=0x7fff4243c080) at dinode.c:2668
> #5 0x000000000040fbae in process_dinode (mp=0x7fff4254c438, dino=0x7fff4254c418, agno=939524166,
> ino=5888, was_free=46501, dirty=0x3, used=0x7fff4243c094, ino_discovery=1, check_dups=0,
> extra_attr_check=1, isa_dir=0x7fff4243c08c, parent=0x7fff4243c080) at dinode.c:2779
> #6 0x00000000004088f6 in process_inode_chunk (mp=0x7fff4243c1d0, agno=5,
> num_inos=<value optimized out>, first_irec=0x2198f60, ino_discovery=1, check_dups=0,
> extra_attr_check=1, bogus=0x7fff4243c114) at dino_chunks.c:778
> #7 0x0000000000408edd in process_aginodes (mp=0x7fff4243c1d0, pf_args=0x7f88284d97b0, agno=5,
> ino_discovery=1, check_dups=0, extra_attr_check=1) at dino_chunks.c:1024
> #8 0x000000000041bfdf in process_ag_func (wq=0x2003730, agno=5, arg=0x7f88284d97b0)
> at phase3.c:161
> #9 0x000000000041c79b in process_ags () at phase3.c:200
> #10 phase3 (mp=0x7fff4243c1d0) at phase3.c:239
> #11 0x0000000000432435 in main (argc=<value optimized out>, argv=<value optimized out>)
> at xfs_repair.c:719
>
> I can provide the full core file, if need be (956M). The xfs_metadump can be found at:
>
> http://files.intrameta.com/metadump.gz (735M)
>
> Any suggestions/ideas on how to proceed are welcome. Please Reply-All, as I'm not subscribed to the ML.
>
> Thanks,
>
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Segmentation fault during xfs_repair
2009-06-06 2:45 ` Eric Sandeen
@ 2009-06-06 3:14 ` Richard Kolkovich
2009-06-06 3:25 ` Eric Sandeen
0 siblings, 1 reply; 6+ messages in thread
From: Richard Kolkovich @ 2009-06-06 3:14 UTC (permalink / raw)
To: Eric Sandeen; +Cc: xfs@oss.sgi.com
On Fri, Jun 05, 2009 at 10:45:37PM -0400, Eric Sandeen wrote:
> Richard Kolkovich wrote:
> > We have a corrupted XFS partition on a storage server. Attempting to run xfs_repair the first time yielded the message about a corrupt log file, so I have run xfs_repair with -L to clear that. Now, xfs_repair segfaults in Phase 3. I have tried -P and a huge -m to no avail. It always seems to segfault at the same point:
> >
> > bad directory block magic # 0 in block 11 for directory inode 341521797
> > corrupt block 11 in directory inode 341521797
> > will junk block
> > Segmentation fault (core dumped)
>
> For starters, which xfsprogs version.... if not latest, try latest... if
> latest, I'll grab that metadump image and see if I can reproduce it.
>
> -Eric
Sorry - forgot to mention that.
Running on Fedora 11 (64bit). Tried using 2.10.2 (from yum) and building from latest stable source (3.0.1). Let me know if I should try a dev build.
Thanks,
--
Richard Kolkovich
IntraMeta Corporation
richard@intrameta.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Segmentation fault during xfs_repair
2009-06-06 3:14 ` Richard Kolkovich
@ 2009-06-06 3:25 ` Eric Sandeen
0 siblings, 0 replies; 6+ messages in thread
From: Eric Sandeen @ 2009-06-06 3:25 UTC (permalink / raw)
To: Richard Kolkovich; +Cc: xfs@oss.sgi.com
Richard Kolkovich wrote:
> On Fri, Jun 05, 2009 at 10:45:37PM -0400, Eric Sandeen wrote:
>> Richard Kolkovich wrote:
>>> We have a corrupted XFS partition on a storage server.
>>> Attempting to run xfs_repair the first time yielded the message
>>> about a corrupt log file, so I have run xfs_repair with -L to
>>> clear that. Now, xfs_repair segfaults in Phase 3. I have tried
>>> -P and a huge -m to no avail. It always seems to segfault at the
>>> same point:
>>>
>>> bad directory block magic # 0 in block 11 for directory inode
>>> 341521797 corrupt block 11 in directory inode 341521797 will junk
>>> block Segmentation fault (core dumped)
>> For starters, which xfsprogs version.... if not latest, try
>> latest... if latest, I'll grab that metadump image and see if I can
>> reproduce it.
>>
>> -Eric
>
> Sorry - forgot to mention that.
>
> Running on Fedora 11 (64bit). Tried using 2.10.2 (from yum) and
> building from latest stable source (3.0.1). Let me know if I should
> try a dev build.
(Hm, did I really leave F11 at 2.10.2? I thought it was newer, but anyway)
No, I doubt anything else has fixed this since 3.0.1
I'll try pulling down that metadump image & see what I can see.
Feel free to file an xfsprogs bug with fedora, too, so the issue doesn't
get lost...
-Eric
> Thanks,
>
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Segmentation fault during xfs_repair
2009-06-05 22:22 Segmentation fault during xfs_repair Richard Kolkovich
2009-06-06 2:45 ` Eric Sandeen
@ 2009-06-06 4:44 ` Eric Sandeen
2009-06-06 5:10 ` Eric Sandeen
1 sibling, 1 reply; 6+ messages in thread
From: Eric Sandeen @ 2009-06-06 4:44 UTC (permalink / raw)
To: Richard Kolkovich; +Cc: xfs
Richard Kolkovich wrote:
> We have a corrupted XFS partition on a storage server. Attempting to run xfs_repair the first time yielded the message about a corrupt log file, so I have run xfs_repair with -L to clear that. Now, xfs_repair segfaults in Phase 3. I have tried -P and a huge -m to no avail. It always seems to segfault at the same point:
>
> bad directory block magic # 0 in block 11 for directory inode 341521797
> corrupt block 11 in directory inode 341521797
> will junk block
> Segmentation fault (core dumped)
...
> I can provide the full core file, if need be (956M). The xfs_metadump can be found at:
>
> http://files.intrameta.com/metadump.gz (735M)
>
> Any suggestions/ideas on how to proceed are welcome. Please Reply-All, as I'm not subscribed to the ML.
Ok, on a -g (not -02) build:
Program terminated with signal 11, Segmentation fault.
#0 0x0000000000418d05 in traverse_int_dir2block (mp=0x7ffff4c4f150,
da_cursor=0x7ffff4c4eb30, rbno=0x7ffff4c4ebdc) at dir2.c:356
356 da_cursor->level[i].hashval =
(gdb) p i
$1 = 46501
i is set from
i = da_cursor->active = be16_to_cpu(node->hdr.level);
(gdb) p node->hdr.level // note this is big endian
$2 = 42421
that's a crazily deep btree, well beyond anything sane:
#define XFS_DA_NODE_MAXDEPTH 5 /* max depth of Btree */
So repair really should be checking for this before it goes off and
indexes it:
356 da_cursor->level[i].hashval =
because the cursor only has this much in the array:
dir2_level_state_t level[XFS_DA_NODE_MAXDEPTH];
I'll have to ponder what repair should do in this case ... and I'll see
if there's something we can do in xfs_db to just whack out this problem
and let repair continue for now.
-Eric
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: Segmentation fault during xfs_repair
2009-06-06 4:44 ` Eric Sandeen
@ 2009-06-06 5:10 ` Eric Sandeen
0 siblings, 0 replies; 6+ messages in thread
From: Eric Sandeen @ 2009-06-06 5:10 UTC (permalink / raw)
To: Richard Kolkovich; +Cc: xfs
Eric Sandeen wrote:
> I'll have to ponder what repair should do in this case ... and I'll see
> if there's something we can do in xfs_db to just whack out this problem
> and let repair continue for now.
>
> -Eric
>
>
This should get you over that hump I think:
--- xfsprogs-3.0.1.orig/repair/dir2.c 2009-06-06 00:01:10.711081870 -0500
+++ xfsprogs-3.0.1/repair/dir2.c 2009-06-06 00:05:52.993365954 -0500
@@ -353,6 +353,14 @@
}
}
+ if (i >= XFS_DA_NODE_MAXDEPTH) {
+ do_warn(_("bad header depth for directory inode %llu\n"),
+ da_cursor->ino);
+ da_brelse(bp);
+ i = -1;
+ goto error_out;
+ }
+
da_cursor->level[i].hashval =
be32_to_cpu(node->btree[0].hashval);
da_cursor->level[i].bp = bp;
but I have to say, that is one fried filesystem you've got there....
-Eric
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2009-06-06 5:09 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-06-05 22:22 Segmentation fault during xfs_repair Richard Kolkovich
2009-06-06 2:45 ` Eric Sandeen
2009-06-06 3:14 ` Richard Kolkovich
2009-06-06 3:25 ` Eric Sandeen
2009-06-06 4:44 ` Eric Sandeen
2009-06-06 5:10 ` Eric Sandeen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox