public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
* bad fs - xfs_repair 3.01 crashes on it
@ 2009-07-03 11:20 Michael Monnerie
  2009-07-03 18:34 ` Eric Sandeen
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Michael Monnerie @ 2009-07-03 11:20 UTC (permalink / raw)
  To: xfs mailing list


[-- Attachment #1.1.1: Type: text/plain, Size: 3470 bytes --]

Tonight our server rebooted, and I found in /var/log/warn that he was crying 
a lot about xfs since June 7 already:

Jun  7 03:06:31 orion.i.zmi.at kernel: Filesystem "dm-0": corrupt inode 3857051697 ((a)extents = 5).  Unmount and run xfs_repair.
Jun  7 03:06:31 orion.i.zmi.at kernel: Pid: 23230, comm: xfs_fsr Tainted: G          2.6.27.21-0.1-xen #1
Jun  7 03:06:31 orion.i.zmi.at kernel:
Jun  7 03:06:31 orion.i.zmi.at kernel: Call Trace:
Jun  7 03:06:31 orion.i.zmi.at kernel:  [<ffffffff8020c597>] show_trace_log_lvl+0x41/0x58
Jun  7 03:06:31 orion.i.zmi.at kernel:  [<ffffffff804635e0>] dump_stack+0x69/0x6f
Jun  7 03:06:31 orion.i.zmi.at kernel:  [<ffffffffa033bbcc>] xfs_iformat_extents+0xc9/0x1c5 [xfs]
Jun  7 03:06:31 orion.i.zmi.at kernel:  [<ffffffffa033c129>] xfs_iformat+0x2b0/0x3f6 [xfs]
Jun  7 03:06:31 orion.i.zmi.at kernel:  [<ffffffffa033c356>] xfs_iread+0xe7/0x1ed [xfs]
Jun  7 03:06:31 orion.i.zmi.at kernel:  [<ffffffffa0337920>] xfs_iget_core+0x3a5/0x63a [xfs]
Jun  7 03:06:31 orion.i.zmi.at kernel:  [<ffffffffa0337c97>] xfs_iget+0xe2/0x187 [xfs]
Jun  7 03:06:31 orion.i.zmi.at kernel:  [<ffffffffa0359302>] xfs_vget_fsop_handlereq+0xc2/0x11b [xfs]
Jun  7 03:06:31 orion.i.zmi.at kernel:  [<ffffffffa03593bb>] xfs_open_by_handle+0x60/0x1cb [xfs]
Jun  7 03:06:31 orion.i.zmi.at kernel:  [<ffffffffa0359f6a>] xfs_ioctl+0x3ca/0x680 [xfs]
Jun  7 03:06:31 orion.i.zmi.at kernel:  [<ffffffffa0357ff6>] xfs_file_ioctl+0x25/0x69 [xfs]
Jun  7 03:06:31 orion.i.zmi.at kernel:  [<ffffffff802aa8cd>] vfs_ioctl+0x21/0x6c
Jun  7 03:06:31 orion.i.zmi.at kernel:  [<ffffffff802aab3a>] do_vfs_ioctl+0x222/0x231
Jun  7 03:06:31 orion.i.zmi.at kernel:  [<ffffffff802aab9a>] sys_ioctl+0x51/0x73
Jun  7 03:06:31 orion.i.zmi.at kernel:  [<ffffffff8020b3b8>] system_call_fastpath+0x16/0x1b
Jun  7 03:06:31 orion.i.zmi.at kernel:  [<00007f7231d6cb77>] 0x7f7231d6cb77

But XFS didn't go offline, so nobody found this messages. There are a lot of them.
They obviously are generated by the nightly "xfs_fsr -v -t 7200" which we run
since then. It would have been nice if xfs_fsr could have displayed
a message, so we would have received the cron mail. (But it got killed
by the kernel, that's a good excuse)

Anyway, so I went to xfs_repair (3.01) and got this:

Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
        - process known inodes and perform inode discovery...
[snip]
        - agno = 14
local inode 3857051697 attr too small (size = 3, min size = 4)
bad attribute fork in inode 3857051697, clearing attr fork
clearing inode 3857051697 attributes
cleared inode 3857051697
[snip]
Phase 4 - check for duplicate blocks...
[snip]
        - agno = 15
data fork in regular inode 3857051697 claims used block 537147998
xfs_repair: dinode.c:2108: process_inode_data_fork: Assertion `err == 0' failed.

And then xfs_repair crashes out, without having repaired. I attached the full 
xfs_repair log here, and http://zmi.at/x/xfs.metadump.data1.bz2
the metadump.

I'll not be here for a week now, I hope the problem is not very serious.

mfg zmi
-- 
// Michael Monnerie, Ing.BSc    -----      http://it-management.at
// Tel: 0660 / 415 65 31                      .network.your.ideas.
// PGP Key:         "curl -s http://zmi.at/zmi.asc | gpg --import"
// Fingerprint: AC19 F9D5 36ED CD8A EF38  500E CE14 91F7 1C12 09B4
// Keyserver: wwwkeys.eu.pgp.net                  Key-ID: 1C1209B4


[-- Attachment #1.1.2: xfsrepair.data1 --]
[-- Type: text/plain, Size: 1930 bytes --]

Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
        - agno = 8
        - agno = 9
        - agno = 10
        - agno = 11
        - agno = 12
        - agno = 13
        - agno = 14
local inode 3857051697 attr too small (size = 3, min size = 4)
bad attribute fork in inode 3857051697, clearing attr fork
clearing inode 3857051697 attributes
cleared inode 3857051697
        - agno = 15
        - agno = 16
        - agno = 17
        - agno = 18
        - agno = 19
        - agno = 20
        - agno = 21
        - agno = 22
        - agno = 23
        - agno = 24
        - agno = 25
        - agno = 26
        - agno = 27
        - agno = 28
        - agno = 29
        - agno = 30
        - agno = 31
        - agno = 32
        - agno = 33
        - agno = 34
        - agno = 35
        - agno = 36
        - agno = 37
        - agno = 38
        - agno = 39
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
        - agno = 8
        - agno = 9
        - agno = 10
        - agno = 11
        - agno = 12
        - agno = 13
        - agno = 14
        - agno = 15
data fork in regular inode 3857051697 claims used block 537147998
xfs_repair: dinode.c:2108: process_inode_data_fork: Assertion `err == 0' failed.

[-- Attachment #1.2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 197 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: bad fs - xfs_repair 3.01 crashes on it
  2009-07-03 11:20 Michael Monnerie
@ 2009-07-03 18:34 ` Eric Sandeen
  2009-07-04  5:43 ` Eric Sandeen
  2009-07-12 18:52 ` Eric Sandeen
  2 siblings, 0 replies; 8+ messages in thread
From: Eric Sandeen @ 2009-07-03 18:34 UTC (permalink / raw)
  To: Michael Monnerie; +Cc: xfs mailing list

Michael Monnerie wrote:
> Tonight our server rebooted, and I found in /var/log/warn that he was crying 
> a lot about xfs since June 7 already:
> 
> Jun  7 03:06:31 orion.i.zmi.at kernel: Filesystem "dm-0": corrupt inode 3857051697 ((a)extents = 5).  Unmount and run xfs_repair.

...

> But XFS didn't go offline, so nobody found this messages. There are a lot of them.
> They obviously are generated by the nightly "xfs_fsr -v -t 7200" which we run
> since then. It would have been nice if xfs_fsr could have displayed
> a message, so we would have received the cron mail. (But it got killed
> by the kernel, that's a good excuse)

I'll have to think about why this didn't shut down the fs.  There are
just a few that don't.

> Anyway, so I went to xfs_repair (3.01) and got this:
> 
> Phase 3 - for each AG...
>         - scan and clear agi unlinked lists...
>         - process known inodes and perform inode discovery...
> [snip]
>         - agno = 14
> local inode 3857051697 attr too small (size = 3, min size = 4)
> bad attribute fork in inode 3857051697, clearing attr fork
> clearing inode 3857051697 attributes
> cleared inode 3857051697
> [snip]
> Phase 4 - check for duplicate blocks...
> [snip]
>         - agno = 15
> data fork in regular inode 3857051697 claims used block 537147998
> xfs_repair: dinode.c:2108: process_inode_data_fork: Assertion `err == 0' failed.
> 
> And then xfs_repair crashes out, without having repaired. I attached the full 
> xfs_repair log here, and http://zmi.at/x/xfs.metadump.data1.bz2
> the metadump.

Thanks for the metadump image, I'll try to take a look.

-Eric

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: bad fs - xfs_repair 3.01 crashes on it
  2009-07-03 11:20 Michael Monnerie
  2009-07-03 18:34 ` Eric Sandeen
@ 2009-07-04  5:43 ` Eric Sandeen
  2009-07-12 17:02   ` Michael Monnerie
  2009-07-12 18:52 ` Eric Sandeen
  2 siblings, 1 reply; 8+ messages in thread
From: Eric Sandeen @ 2009-07-04  5:43 UTC (permalink / raw)
  To: Michael Monnerie; +Cc: xfs mailing list

Michael Monnerie wrote:
> Tonight our server rebooted, and I found in /var/log/warn that he was crying 
> a lot about xfs since June 7 already:

...

> But XFS didn't go offline, so nobody found this messages. There are a lot of them.
> They obviously are generated by the nightly "xfs_fsr -v -t 7200" which we run
> since then. It would have been nice if xfs_fsr could have displayed
> a message, so we would have received the cron mail. (But it got killed
> by the kernel, that's a good excuse)

ok yeah we should see why fsr didn't print anything ...

> Anyway, so I went to xfs_repair (3.01) and got this:
> 
> Phase 3 - for each AG...
>         - scan and clear agi unlinked lists...
>         - process known inodes and perform inode discovery...
> [snip]
>         - agno = 14
> local inode 3857051697 attr too small (size = 3, min size = 4)
> bad attribute fork in inode 3857051697, clearing attr fork
> clearing inode 3857051697 attributes
> cleared inode 3857051697
> [snip]
> Phase 4 - check for duplicate blocks...
> [snip]
>         - agno = 15
> data fork in regular inode 3857051697 claims used block 537147998
> xfs_repair: dinode.c:2108: process_inode_data_fork: Assertion `err == 0' failed.

Ok, so this is essentially some code which first does a scan; if it
finds an error it bails out and clears the inode, but if not, it calls
essentially the same function again, comments say "set bitmaps this
time" - but on the 2nd call it finds an error, which isn't handled well.
 The ASSERT(err == 0) bit is presumably because if the first scan didn't
find anything, the 2nd call shouldn't either, but ... not the case here
:(  There are more checks that can go wrong -after- the scan-only portion.

So either the caller needs to cope w/ the error at this point, or the
scan only business needs do all the checks, I think.

Where's Barry when you need him ....

Also I need to look at when the ASSERTs are active and when they should
be; the Fedora packaged xfsprogs doesn't have the ASSERT active, and so
this doesn't trip.  After 2 calls to xfs_repair on Fedora, w/o the
ASSERTs active, it checks clean on the 3rd (!).  Not great.  Not sure
how much was cleared out in the process either...

-Eric


_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: bad fs - xfs_repair 3.01 crashes on it
  2009-07-04  5:43 ` Eric Sandeen
@ 2009-07-12 17:02   ` Michael Monnerie
  2009-07-12 18:09     ` Eric Sandeen
  0 siblings, 1 reply; 8+ messages in thread
From: Michael Monnerie @ 2009-07-12 17:02 UTC (permalink / raw)
  To: xfs


[-- Attachment #1.1: Type: text/plain, Size: 960 bytes --]

On Samstag 04 Juli 2009 Eric Sandeen wrote:
> Where's Barry when you need him ....

Who's that?

> Also I need to look at when the ASSERTs are active and when they
> should be; the Fedora packaged xfsprogs doesn't have the ASSERT
> active, and so this doesn't trip.  After 2 calls to xfs_repair on
> Fedora, w/o the ASSERTs active, it checks clean on the 3rd (!).  Not
> great.  Not sure how much was cleared out in the process either...

Any ideas/news on this? I'd like to xfs_repair that stuff. It seems to 
only hit one file, but I don't dare delete it, maybe it makes things 
worse?

mfg zmi
-- 
// Michael Monnerie, Ing.BSc    -----      http://it-management.at
// Tel: 0660 / 415 65 31                      .network.your.ideas.
// PGP Key:         "curl -s http://zmi.at/zmi.asc | gpg --import"
// Fingerprint: AC19 F9D5 36ED CD8A EF38  500E CE14 91F7 1C12 09B4
// Keyserver: wwwkeys.eu.pgp.net                  Key-ID: 1C1209B4


[-- Attachment #1.2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 197 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: bad fs - xfs_repair 3.01 crashes on it
  2009-07-12 17:02   ` Michael Monnerie
@ 2009-07-12 18:09     ` Eric Sandeen
  0 siblings, 0 replies; 8+ messages in thread
From: Eric Sandeen @ 2009-07-12 18:09 UTC (permalink / raw)
  To: Michael Monnerie; +Cc: xfs

Michael Monnerie wrote:
> On Samstag 04 Juli 2009 Eric Sandeen wrote:
>> Where's Barry when you need him ....
> 
> Who's that?

The ex-sgi xfs_repair maintainer :)

>> Also I need to look at when the ASSERTs are active and when they
>> should be; the Fedora packaged xfsprogs doesn't have the ASSERT
>> active, and so this doesn't trip.  After 2 calls to xfs_repair on
>> Fedora, w/o the ASSERTs active, it checks clean on the 3rd (!).  Not
>> great.  Not sure how much was cleared out in the process either...
> 
> Any ideas/news on this? I'd like to xfs_repair that stuff. It seems to 
> only hit one file, but I don't dare delete it, maybe it makes things 
> worse?
> 
> mfg zmi

Sorry, I will get back to this soon - today I hope.  I seem to be
getting more and more familiar w/ xfs_repair these days.  :)

If you do want to try deleting that one file or other such tricks, you
can do it on a sparse metadata image of the fs as a dry run:

# xfs_metadump -o /dev/whatever metadump.img
# xfs_mdrestore metadump.img filesystem.img
# mount -o loop filesystem.img mnt/
# <fiddle as you please>
# umount mnt/
# xfs_repair filesystem.img
# mount -o loop filesystem.img mnt/

and see what happens...

-Eric

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: bad fs - xfs_repair 3.01 crashes on it
  2009-07-03 11:20 Michael Monnerie
  2009-07-03 18:34 ` Eric Sandeen
  2009-07-04  5:43 ` Eric Sandeen
@ 2009-07-12 18:52 ` Eric Sandeen
  2009-07-12 22:08   ` Michael Monnerie
  2 siblings, 1 reply; 8+ messages in thread
From: Eric Sandeen @ 2009-07-12 18:52 UTC (permalink / raw)
  To: Michael Monnerie; +Cc: xfs mailing list

Michael Monnerie wrote:
> Tonight our server rebooted, and I found in /var/log/warn that he was crying 
> a lot about xfs since June 7 already:
> 
> Jun  7 03:06:31 orion.i.zmi.at kernel: Filesystem "dm-0": corrupt inode 3857051697 ((a)extents = 5).  Unmount and run xfs_repair.
> Jun  7 03:06:31 orion.i.zmi.at kernel: Pid: 23230, comm: xfs_fsr Tainted: G          2.6.27.21-0.1-xen #1
> Jun  7 03:06:31 orion.i.zmi.at kernel:

Hm, the other sort of interesting thing here is that a recently-reported
RH bug:

[Bug 510823] "Structure needs cleaning" when reading files from an XFS
partition (extent count for ino XYZ data fork too low (6) for file format)

also seems to -possibly- be related to an xfs_fsr run, and also is
related to extents in the wrong format.  In that case it was the
opposite; an inode was found in btree format which had few enough
extents that it should have been in the extents format in the inode; in
your case, it looks like there were too many extents to fit in the
format it had...

Just out of curiosity, it looks like you have rather a lot of extended
attributes on at least the inode above, is that accurate?  Or maybe
that's part of the corruption?

I'll focus on getting xfs_repair to cope first, but I wonder what
happened here...

Thanks,
-Eric

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: bad fs - xfs_repair 3.01 crashes on it
  2009-07-12 18:52 ` Eric Sandeen
@ 2009-07-12 22:08   ` Michael Monnerie
  0 siblings, 0 replies; 8+ messages in thread
From: Michael Monnerie @ 2009-07-12 22:08 UTC (permalink / raw)
  To: xfs

On Sonntag 12 Juli 2009 Eric Sandeen wrote:
> Just out of curiosity, it looks like you have rather a lot of
> extended attributes on at least the inode above, is that accurate?
>  Or maybe that's part of the corruption?

# find . -inum 3857051697
find: "./samba/tmp/BettyPC.tib": Die Struktur muss bereinigt werden
(means: structure needs cleaning)

I'm not sure if that message means that file has the corresponding inode
number? If it is, it's a backup of a PC made with Acronis.

Normally I only use xattr's to set one or two extra rights

> I'll focus on getting xfs_repair to cope first, but I wonder what
> happened here...

No idea. Didn't have a crash on that server IIRC. I tried some "ls" 
and "getfacl" and got these crashes:

Jul 13 00:01:10 orion.i.zmi.at kernel: Filesystem "dm-0": corrupt inode 3857051697 ((a)extents = 5).  Unmount and run xfs_repair.
Jul 13 00:01:10 orion.i.zmi.at kernel: Pid: 17213, comm: find Tainted: G          2.6.27.23-0.1-xen #1
Jul 13 00:01:10 orion.i.zmi.at kernel:
Jul 13 00:01:10 orion.i.zmi.at kernel: Call Trace:
Jul 13 00:01:10 orion.i.zmi.at kernel:  [<ffffffff8020c597>] show_trace_log_lvl+0x41/0x58
Jul 13 00:01:10 orion.i.zmi.at kernel:  [<ffffffff80463a33>] dump_stack+0x69/0x6f
Jul 13 00:01:10 orion.i.zmi.at kernel:  [<ffffffffa033abf8>] xfs_iformat_extents+0xc9/0x1c4 [xfs]
Jul 13 00:01:10 orion.i.zmi.at kernel:  [<ffffffffa033b153>] xfs_iformat+0x2b0/0x3f7 [xfs]
Jul 13 00:01:10 orion.i.zmi.at kernel:  [<ffffffffa033b381>] xfs_iread+0xe7/0x1ee [xfs]
Jul 13 00:01:10 orion.i.zmi.at kernel:  [<ffffffffa0336928>] xfs_iget_core+0x3a5/0x63a [xfs]
Jul 13 00:01:10 orion.i.zmi.at kernel:  [<ffffffffa0336c9f>] xfs_iget+0xe2/0x187 [xfs]
Jul 13 00:01:10 orion.i.zmi.at kernel:  [<ffffffffa0350941>] xfs_lookup+0x79/0xa5 [xfs]
Jul 13 00:01:10 orion.i.zmi.at kernel:  [<ffffffffa035955b>] xfs_vn_lookup+0x3c/0x78 [xfs]
Jul 13 00:01:10 orion.i.zmi.at kernel:  [<ffffffff802a5d27>] real_lookup+0x7e/0x10f
Jul 13 00:01:10 orion.i.zmi.at kernel:  [<ffffffff802a5e1b>] do_lookup+0x63/0xb6
Jul 13 00:01:10 orion.i.zmi.at kernel:  [<ffffffff802a84a8>] __link_path_walk+0x9f4/0xe58
Jul 13 00:01:10 orion.i.zmi.at kernel:  [<ffffffff802a8ad7>] path_walk+0x5e/0xb9
Jul 13 00:01:10 orion.i.zmi.at kernel:  [<ffffffff802a8c94>] do_path_lookup+0x162/0x1b9
Jul 13 00:01:10 orion.i.zmi.at kernel:  [<ffffffff802a962e>] user_path_at+0x48/0x79
Jul 13 00:01:10 orion.i.zmi.at kernel:  [<ffffffff802a1b65>] vfs_lstat_fd+0x15/0x41
Jul 13 00:01:10 orion.i.zmi.at kernel:  [<ffffffff802a1c68>] sys_newfstatat+0x22/0x43
Jul 13 00:01:10 orion.i.zmi.at kernel:  [<ffffffff8020b3b8>] system_call_fastpath+0x16/0x1b
Jul 13 00:01:10 orion.i.zmi.at kernel:  [<00007f802a89f4ce>] 0x7f802a89f4ce
Jul 13 00:01:10 orion.i.zmi.at kernel:
Jul 13 00:02:35 orion.i.zmi.at kernel: Filesystem "dm-0": corrupt inode 3857051697 ((a)extents = 5).  Unmount and run xfs_repair.
Jul 13 00:02:35 orion.i.zmi.at kernel: Pid: 17232, comm: getfacl Tainted: G          2.6.27.23-0.1-xen #1
Jul 13 00:02:35 orion.i.zmi.at kernel:
Jul 13 00:02:35 orion.i.zmi.at kernel: Call Trace:
Jul 13 00:02:35 orion.i.zmi.at kernel:  [<ffffffff8020c597>] show_trace_log_lvl+0x41/0x58
Jul 13 00:02:35 orion.i.zmi.at kernel:  [<ffffffff80463a33>] dump_stack+0x69/0x6f
Jul 13 00:02:35 orion.i.zmi.at kernel:  [<ffffffffa033abf8>] xfs_iformat_extents+0xc9/0x1c4 [xfs]
Jul 13 00:02:35 orion.i.zmi.at kernel:  [<ffffffffa033b153>] xfs_iformat+0x2b0/0x3f7 [xfs]
Jul 13 00:02:35 orion.i.zmi.at kernel:  [<ffffffffa033b381>] xfs_iread+0xe7/0x1ee [xfs]
Jul 13 00:02:35 orion.i.zmi.at kernel:  [<ffffffffa0336928>] xfs_iget_core+0x3a5/0x63a [xfs]
Jul 13 00:02:35 orion.i.zmi.at kernel:  [<ffffffffa0336c9f>] xfs_iget+0xe2/0x187 [xfs]
Jul 13 00:02:35 orion.i.zmi.at kernel:  [<ffffffffa0350941>] xfs_lookup+0x79/0xa5 [xfs]
Jul 13 00:02:35 orion.i.zmi.at kernel:  [<ffffffffa035955b>] xfs_vn_lookup+0x3c/0x78 [xfs]
Jul 13 00:02:35 orion.i.zmi.at kernel:  [<ffffffff802a5d27>] real_lookup+0x7e/0x10f
Jul 13 00:02:35 orion.i.zmi.at kernel:  [<ffffffff802a5e1b>] do_lookup+0x63/0xb6
Jul 13 00:02:35 orion.i.zmi.at kernel:  [<ffffffff802a84a8>] __link_path_walk+0x9f4/0xe58
Jul 13 00:02:35 orion.i.zmi.at kernel:  [<ffffffff802a8ad7>] path_walk+0x5e/0xb9
Jul 13 00:02:35 orion.i.zmi.at kernel:  [<ffffffff802a8c94>] do_path_lookup+0x162/0x1b9
Jul 13 00:02:35 orion.i.zmi.at kernel:  [<ffffffff802a962e>] user_path_at+0x48/0x79
Jul 13 00:02:35 orion.i.zmi.at kernel:  [<ffffffff802a1b65>] vfs_lstat_fd+0x15/0x41
Jul 13 00:02:35 orion.i.zmi.at kernel:  [<ffffffff802a1baa>] sys_newlstat+0x19/0x31
Jul 13 00:02:35 orion.i.zmi.at kernel:  [<ffffffff8020b3b8>] system_call_fastpath+0x16/0x1b
Jul 13 00:02:35 orion.i.zmi.at kernel:  [<00007f9a8d911225>] 0x7f9a8d911225

Jul 11 03:02:53 orion.i.zmi.at kernel: Filesystem "dm-0": corrupt inode 3857051697 ((a)extents = 5).  Unmount and run xfs_repair.
Jul 11 03:02:53 orion.i.zmi.at kernel: Pid: 2881, comm: xfs_fsr Tainted: G          2.6.27.23-0.1-xen #1
Jul 11 03:02:53 orion.i.zmi.at kernel:
Jul 11 03:02:53 orion.i.zmi.at kernel: Call Trace:
Jul 11 03:02:53 orion.i.zmi.at kernel:  [<ffffffff8020c597>] show_trace_log_lvl+0x41/0x58
Jul 11 03:02:53 orion.i.zmi.at kernel:  [<ffffffff80463a33>] dump_stack+0x69/0x6f
Jul 11 03:02:53 orion.i.zmi.at kernel:  [<ffffffffa033abf8>] xfs_iformat_extents+0xc9/0x1c4 [xfs]
Jul 11 03:02:53 orion.i.zmi.at kernel:  [<ffffffffa033b153>] xfs_iformat+0x2b0/0x3f7 [xfs]
Jul 11 03:02:53 orion.i.zmi.at kernel:  [<ffffffffa033b381>] xfs_iread+0xe7/0x1ee [xfs]
Jul 11 03:02:53 orion.i.zmi.at kernel:  [<ffffffffa0336928>] xfs_iget_core+0x3a5/0x63a [xfs]
Jul 11 03:02:53 orion.i.zmi.at kernel:  [<ffffffffa0336c9f>] xfs_iget+0xe2/0x187 [xfs]
Jul 11 03:02:53 orion.i.zmi.at kernel:  [<ffffffffa0358336>] xfs_vget_fsop_handlereq+0xc2/0x11b [xfs]
Jul 11 03:02:53 orion.i.zmi.at kernel:  [<ffffffffa03583ef>] xfs_open_by_handle+0x60/0x1cb [xfs]
Jul 11 03:02:53 orion.i.zmi.at kernel:  [<ffffffffa0358f9e>] xfs_ioctl+0x3ca/0x680 [xfs]
Jul 11 03:02:53 orion.i.zmi.at kernel:  [<ffffffffa035702a>] xfs_file_ioctl+0x25/0x69 [xfs]
Jul 11 03:02:53 orion.i.zmi.at kernel:  [<ffffffff802aab39>] vfs_ioctl+0x21/0x6c
Jul 11 03:02:53 orion.i.zmi.at kernel:  [<ffffffff802aada6>] do_vfs_ioctl+0x222/0x231
Jul 11 03:02:53 orion.i.zmi.at kernel:  [<ffffffff802aae06>] sys_ioctl+0x51/0x73
Jul 11 03:02:53 orion.i.zmi.at kernel:  [<ffffffff8020b3b8>] system_call_fastpath+0x16/0x1b
Jul 11 03:02:53 orion.i.zmi.at kernel:  [<00007fc76ba0bb77>] 0x7fc76ba0bb77

I also found this one:
Jul 12 00:01:29 orion.i.zmi.at kernel: Filesystem "dm-0": corrupt inode 3857051697 ((a)extents = 5).  Unmount and run xfs_repair.
Jul 12 00:01:29 orion.i.zmi.at kernel: 00000000: 49 4e 81 ff 02 02 00 00 00 00 03 e8 00 00 00 64  IN.............d
Jul 12 00:01:29 orion.i.zmi.at kernel: Filesystem "dm-0": XFS internal error xfs_iformat_extents(1) at line 565 of file fs/xfs/xfs_inode.c.  Caller 0xffffffffa033b153
Jul 12 00:01:29 orion.i.zmi.at kernel: Pid: 9592, comm: find Tainted: G          2.6.27.23-0.1-xen #1
Jul 12 00:01:29 orion.i.zmi.at kernel:
Jul 12 00:01:29 orion.i.zmi.at kernel: Call Trace:
Jul 12 00:01:29 orion.i.zmi.at kernel:  [<ffffffff8020c597>] show_trace_log_lvl+0x41/0x58
Jul 12 00:01:29 orion.i.zmi.at kernel:  [<ffffffff80463a33>] dump_stack+0x69/0x6f
Jul 12 00:01:29 orion.i.zmi.at kernel:  [<ffffffffa033abf8>] xfs_iformat_extents+0xc9/0x1c4 [xfs]
Jul 12 00:01:29 orion.i.zmi.at kernel:  [<ffffffffa033b153>] xfs_iformat+0x2b0/0x3f7 [xfs]
Jul 12 00:01:29 orion.i.zmi.at kernel:  [<ffffffffa033b381>] xfs_iread+0xe7/0x1ee [xfs]
Jul 12 00:01:29 orion.i.zmi.at kernel:  [<ffffffffa0336928>] xfs_iget_core+0x3a5/0x63a [xfs]
Jul 12 00:01:29 orion.i.zmi.at kernel:  [<ffffffffa0336c9f>] xfs_iget+0xe2/0x187 [xfs]
Jul 12 00:01:29 orion.i.zmi.at kernel:  [<ffffffffa0350941>] xfs_lookup+0x79/0xa5 [xfs]
Jul 12 00:01:29 orion.i.zmi.at kernel:  [<ffffffffa035955b>] xfs_vn_lookup+0x3c/0x78 [xfs]
Jul 12 00:01:29 orion.i.zmi.at kernel:  [<ffffffff802a5d27>] real_lookup+0x7e/0x10f
Jul 12 00:01:29 orion.i.zmi.at kernel:  [<ffffffff802a5e1b>] do_lookup+0x63/0xb6
Jul 12 00:01:29 orion.i.zmi.at kernel:  [<ffffffff802a84a8>] __link_path_walk+0x9f4/0xe58
Jul 12 00:01:29 orion.i.zmi.at kernel:  [<ffffffff802a8ad7>] path_walk+0x5e/0xb9
Jul 12 00:01:29 orion.i.zmi.at kernel:  [<ffffffff802a8c94>] do_path_lookup+0x162/0x1b9
Jul 12 00:01:29 orion.i.zmi.at kernel:  [<ffffffff802a962e>] user_path_at+0x48/0x79
Jul 12 00:01:29 orion.i.zmi.at kernel:  [<ffffffff802a1b65>] vfs_lstat_fd+0x15/0x41
Jul 12 00:01:29 orion.i.zmi.at kernel:  [<ffffffff802a1c68>] sys_newfstatat+0x22/0x43
Jul 12 00:01:29 orion.i.zmi.at kernel:  [<ffffffff8020b3b8>] system_call_fastpath+0x16/0x1b
Jul 12 00:01:29 orion.i.zmi.at kernel:  [<00007fdc209084ce>] 0x7fdc209084ce


mfg zmi
-- 
// Michael Monnerie, Ing.BSc    -----      http://it-management.at
// Tel: 0660 / 415 65 31                      .network.your.ideas.
// PGP Key:         "curl -s http://zmi.at/zmi.asc | gpg --import"
// Fingerprint: AC19 F9D5 36ED CD8A EF38  500E CE14 91F7 1C12 09B4
// Keyserver: wwwkeys.eu.pgp.net                  Key-ID: 1C1209B4

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: bad fs - xfs_repair 3.01 crashes on it
@ 2009-08-31  6:40 Michael Monnerie
  0 siblings, 0 replies; 8+ messages in thread
From: Michael Monnerie @ 2009-08-31  6:40 UTC (permalink / raw)
  To: xfs


[-- Attachment #1.1: Type: text/plain, Size: 16457 bytes --]

On Sonntag 12 Juli 2009 Eric Sandeen wrote:
> If you do want to try deleting that one file or other such tricks,
> you can do it on a sparse metadata image of the fs as a dry run:
>
> # xfs_metadump -o /dev/whatever metadump.img
> # xfs_mdrestore metadump.img filesystem.img
> # mount -o loop filesystem.img mnt/
> # <fiddle as you please>
> # umount mnt/
> # xfs_repair filesystem.img
> # mount -o loop filesystem.img mnt/
>
> and see what happens...

To warm up the old thread, I did this now:

* make metadump
* mount it
* remove unneeded files/dirs
This already produced lots of errors, where files/dirs couldn't be 
deleted. I made a metadump of this again, it's on 
http://zmi.at/xfs.metadump-brokenonly.bz2

Then I tried with v3.0.1:
# xfs_repair xfs.img 
Phase 1 - find and verify superblock...      
Phase 2 - using internal log                 
        - zero log...                        
        - scan filesystem freespace and inode maps...
        - found root inode chunk                     
Phase 3 - for each AG...                             
        - scan and clear agi unlinked lists...       
        - process known inodes and perform inode discovery...
        - agno = 0                                           
        - agno = 1                                           
        - agno = 2                                           
        - agno = 3                                           
        - agno = 4                                           
        - agno = 5                                           
        - agno = 6                                           
        - agno = 7                                           
        - agno = 8                                           
        - agno = 9                                           
        - agno = 10                                          
        - agno = 11                                          
        - agno = 12                                          
        - agno = 13                                          
        - agno = 14                                          
local inode 3857051697 attr too small (size = 3, min size = 4)
bad attribute fork in inode 3857051697, clearing attr fork    
clearing inode 3857051697 attributes                          
cleared inode 3857051697                                      
        - agno = 15                                           
        - agno = 16                                           
        - agno = 17                                           
        - agno = 18                                           
        - agno = 19                                           
        - agno = 20                                           
        - agno = 21                                           
        - agno = 22                                           
        - agno = 23                                           
        - agno = 24                                           
        - agno = 25                                           
        - agno = 26                                           
        - agno = 27                                           
        - agno = 28                                           
        - agno = 29                                           
        - agno = 30                                           
        - agno = 31                                           
        - agno = 32                                           
        - agno = 33                                           
        - agno = 34                                           
        - agno = 35                                           
        - agno = 36                                           
        - agno = 37                                           
        - agno = 38                                           
        - agno = 39                                           
        - process newly discovered inodes...                  
Phase 4 - check for duplicate blocks...                       
        - setting up duplicate extent list...                 
        - check for inodes claiming duplicate blocks...       
        - agno = 0                                            
        - agno = 1                                            
        - agno = 2                                            
        - agno = 3                                            
        - agno = 4                                            
        - agno = 5                                            
        - agno = 6                                            
        - agno = 7                                            
        - agno = 8                                            
        - agno = 9                                            
        - agno = 10                                           
        - agno = 11                                           
        - agno = 12                                           
        - agno = 13                                           
        - agno = 14                                           
        - agno = 15                                           
        - agno = 16                                           
        - agno = 17                                           
        - agno = 18                                           
data fork in regular inode 3857051697 claims used block 537546384
        - agno = 19                                              
xfs_repair: dinode.c:2108: process_inode_data_fork: Zusicherung »err == 
0« nicht erfüllt.
Abgebrochen                                                                              

So I patched out the ASSERT in #dinode.c:2108 and this made:

# xfs_repair xfs.img                                             
Phase 1 - find and verify superblock...                                                  
Phase 2 - using internal log                                                             
        - zero log...                                                                    
        - scan filesystem freespace and inode maps...                                    
        - found root inode chunk                                                         
Phase 3 - for each AG...                                                                 
        - scan and clear agi unlinked lists...                                           
        - process known inodes and perform inode discovery...                            
        - agno = 0                                                                       
        - agno = 1                                                                       
        - agno = 2                                                                       
        - agno = 3                                                                       
        - agno = 4                                                                       
        - agno = 5                                                                       
        - agno = 6                                                                       
        - agno = 7                                                                       
        - agno = 8                                                                       
        - agno = 9                                                                       
        - agno = 10                                                                      
        - agno = 11                                                                      
        - agno = 12                                                                      
        - agno = 13                                                                      
        - agno = 14                                                                      
local inode 3857051697 attr too small (size = 3, min size = 4)                           
bad attribute fork in inode 3857051697, clearing attr fork                               
clearing inode 3857051697 attributes                                                     
cleared inode 3857051697                                                                 
        - agno = 15                                                                      
        - agno = 16                                                                      
        - agno = 17                                                                      
        - agno = 18                                                                      
        - agno = 19                                                                      
        - agno = 20                                                                      
        - agno = 21                                                                      
        - agno = 22                                                                      
        - agno = 23                                                                      
        - agno = 24                                                                      
        - agno = 25                                                                      
        - agno = 26                                                                      
        - agno = 27                                                                      
        - agno = 28                                                                      
        - agno = 29                                                                      
        - agno = 30                                                                      
        - agno = 31                                                                      
        - agno = 32                                                                      
        - agno = 33                                                                      
        - agno = 34                                                                      
        - agno = 35                                                                      
        - agno = 36                                                                      
        - agno = 37                                                                      
        - agno = 38                                                                      
        - agno = 39                                                                      
        - process newly discovered inodes...                                             
Phase 4 - check for duplicate blocks...                                                  
        - setting up duplicate extent list...                                            
        - check for inodes claiming duplicate blocks...                                  
        - agno = 0                                                                       
        - agno = 1                                                                       
        - agno = 2                                                                       
        - agno = 3                                                                       
        - agno = 4                                                                       
        - agno = 5                                                                       
        - agno = 6                                                                       
        - agno = 7                                                                       
        - agno = 8                                                                       
        - agno = 9                                                                       
        - agno = 10                                                                      
        - agno = 11                                                                      
        - agno = 12                                                                      
        - agno = 13                                                                      
        - agno = 14
        - agno = 15
        - agno = 16
        - agno = 17
        - agno = 18
        - agno = 19
        - agno = 20
data fork in regular inode 3857051697 claims used block 537546384
bad attribute format 1 in inode 3857051697, resetting value
correcting nblocks for inode 3857051697, was 10135251 - counted 8388604
        - agno = 21
        - agno = 22
        - agno = 23
        - agno = 24
data fork in regular inode 6174936063 claims used block 537240415
correcting nblocks for inode 6174936063, was 1 - counted 0
data fork in regular inode 6180186880 claims used block 537242879
correcting nblocks for inode 6180186880, was 1 - counted 0
        - agno = 25
        - agno = 26
        - agno = 27
data fork in regular inode 7257143306 claims used block 537251790
correcting nblocks for inode 7257143306, was 1 - counted 0
data fork in regular inode 7257143307 claims used block 537257951
correcting nblocks for inode 7257143307, was 1 - counted 0
data fork in regular inode 6720520457 claims used block 537246687
correcting nblocks for inode 6720520457, was 1 - counted 0
data fork in regular inode 6720520458 claims used block 537247327
correcting nblocks for inode 6720520458, was 1 - counted 0
        - agno = 28
        - agno = 29
        - agno = 30
        - agno = 31
data fork in regular inode 8326467385 claims used block 537198367
correcting nblocks for inode 8326467385, was 1 - counted 0
        - agno = 32
inode block 537201328 multiply claimed, state was 3
inode block 537201329 multiply claimed, state was 3
inode block 537201330 multiply claimed, state was 3
data fork in regular inode 8595221283 claims used block 537201683
correcting nblocks for inode 8595221283, was 1 - counted 0
data fork in regular inode 8595221284 claims used block 537201684
        - agno = 33
correcting nblocks for inode 8595221284, was 5 - counted 0
data fork in regular inode 8595221285 claims used block 537201689
correcting nblocks for inode 8595221285, was 1 - counted 0
data fork in regular inode 8595221286 claims used block 537201690
correcting nblocks for inode 8595221286, was 6 - counted 0
data fork in regular inode 8326763299 claims used block 537270223
correcting nblocks for inode 8326763299, was 1 - counted 0
data fork in regular inode 8326763300 claims used block 537271439
correcting nblocks for inode 8326763300, was 1 - counted 0
data fork in regular inode 8595221287 claims used block 537201696
correcting nblocks for inode 8595221287, was 1 - counted 0
data fork in regular inode 8595221288 claims used block 537201699
attr fork in regular inode 8595221288 claims used block 537201698
xfs_repair: dinode.c:2241: process_inode_attr_fork: Zusicherung »err == 
0« nicht erfüllt.
data fork in regular inode 8058708772 claims used block 537258543
correcting nblocks for inode 8058708772, was 1 - counted 0
data fork in regular inode 8058708773 claims used block 537260719
correcting nblocks for inode 8058708773, was 1 - counted 0
Abgebrochen

So again patch dinode.c:2241 ASSERT away:
# xfs_repair xfs.img
(about 550KB output, see http://zmi.at/xfs_repair.txt )
corrupt dinode 8326467385, extent total = 1, nblocks = 0.  This is a 
bug.
Please capture the filesystem metadata with xfs_metadump and
report it to xfs@oss.sgi.com.
cache_node_purge: refcount was 1, not zero (node=0x7f90c4de34e0)

fatal error -- couldn't map inode 8326467385, err = 117

Now, should I PANIC? Doesn't all seem to be nice...
I mounted this image, made "rm -r *" so all files/dirs which were good 
were deleted. There are still a lot in there which can't be deleted. Can 
someone help me fix it please?
I made another metadump image, it's on 
http://zmi.at/xfs.metadump-brokenonly2.bz2

mfg zmi
-- 
// Michael Monnerie, Ing.BSc    -----      http://it-management.at
// Tel: 0660 / 415 65 31                      .network.your.ideas.
// PGP Key:         "curl -s http://zmi.at/zmi.asc | gpg --import"
// Fingerprint: AC19 F9D5 36ED CD8A EF38  500E CE14 91F7 1C12 09B4
// Keyserver: wwwkeys.eu.pgp.net                  Key-ID: 1C1209B4



[-- Attachment #1.2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 197 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2009-08-31  6:41 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-08-31  6:40 bad fs - xfs_repair 3.01 crashes on it Michael Monnerie
  -- strict thread matches above, loose matches on Subject: below --
2009-07-03 11:20 Michael Monnerie
2009-07-03 18:34 ` Eric Sandeen
2009-07-04  5:43 ` Eric Sandeen
2009-07-12 17:02   ` Michael Monnerie
2009-07-12 18:09     ` Eric Sandeen
2009-07-12 18:52 ` Eric Sandeen
2009-07-12 22:08   ` Michael Monnerie

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox