xfs_repair on a 1.5 TiB image has been hanging for about an hour, now

public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed

* xfs_repair on a 1.5 TiB image has been hanging for about an hour, now
@ 2010-02-12 14:07 Richard Hartmann
  2010-02-12 15:28 ` Eric Sandeen
       [not found] ` <20100212165801.GA7323@puku.stupidest.org>
  0 siblings, 2 replies; 18+ messages in thread
From: Richard Hartmann @ 2010-02-12 14:07 UTC (permalink / raw)
  To: xfs

Hi all,

one of our RAIDs recovered into an inconsistant state and I need to get
back what data I can.

All steps taken happened and will continue to happen under Linux with
either Debian and/or grml.

The XFS file system that can not be mounted is in a partition which is
1.5 TiB in size.
I put an additional 2 TiB drive and used dd_rescue to copy over what the
RAID presents as one large block device.

I then ran kpartx on the image to create the appropriate devices in
/dev/mapper.

As the metadata was corrupt, I ran xfs_metadump -g and -go to have a
personal backup and one I could give out if needed.

I then ran xfs_repair -L /dev/mapper/loop1p3 which has not been making
any progress for the last hour.
I don't see much load in ways of CPU or i/o, yet there is literally no
progress at all. I am hanging at:

no . entry for directory 39756
no .. entry for directory 39756
problem with directory contents in inode 39756
cleared inode 39756
data fork in ino 39771 claims free block 8391321
bad directory block magic # 0x58443242 in block 0 for directory inode 39771
corrupt block 0 in directory inode 39771
        will junk block
no . entry for directory 39771
no .. entry for directory 39771
problem with directory contents in inode 39771
cleared inode 39771
data fork in ino 39774 claims free block 44122
data fork in ino 39774 claims free block 8432730
bad directory block magic # 0x1f1f1f1f in block 0 for directory inode 39774
corrupt block 0 in directory inode 39774
        will junk block
no . entry for directory 39774
no .. entry for directory 39774
problem with directory contents in inode 39774
cleared inode 39774
multiple . entries in directory inode 39816: clearing entry
multiple .. entries in directory inode 39816: clearing entry
bad directory block magic # 0x58443244 in block 0 for directory inode 39840
corrupt block 0 in directory inode 39840
        will junk block
no . entry for directory 39840
no .. entry for directory 39840
problem with directory contents in inode 39840
cleared inode 39840
bad . entry in directory inode 39909, was 45188: correcting
data fork in ino 40102 claims free block 8400946
ab3e1b90: Badness in key lookup (length)
bp=(bno 258176, len 8192 bytes) key=(bno 258176, len 4096 bytes)
ab3e1b90: Badness in key lookup (length)
bp=(bno 258224, len 8192 bytes) key=(bno 258224, len 4096 bytes)

Is there any way to read out the current state of xfs_repair and tell
what it is doing and when it will finish? I am aware that the end time
is impossible to predict, I am just looking for ballpark figures.

Thanks a _lot_ for any and all help,
Richard

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: xfs_repair on a 1.5 TiB image has been hanging for about an hour, now
  2010-02-12 14:07 xfs_repair on a 1.5 TiB image has been hanging for about an hour, now Richard Hartmann
@ 2010-02-12 15:28 ` Eric Sandeen
  2010-02-12 16:45   ` Richard Hartmann
       [not found] ` <20100212165801.GA7323@puku.stupidest.org>
  1 sibling, 1 reply; 18+ messages in thread
From: Eric Sandeen @ 2010-02-12 15:28 UTC (permalink / raw)
  To: Richard Hartmann; +Cc: xfs

Richard Hartmann wrote:
> Hi all,
> 
> one of our RAIDs recovered into an inconsistant state and I need to get
> back what data I can.
> 
> All steps taken happened and will continue to happen under Linux with
> either Debian and/or grml.
> 
> The XFS file system that can not be mounted is in a partition which is
> 1.5 TiB in size.
> I put an additional 2 TiB drive and used dd_rescue to copy over what the
> RAID presents as one large block device.
> 
> I then ran kpartx on the image to create the appropriate devices in
> /dev/mapper.
> 
> As the metadata was corrupt, I ran xfs_metadump -g and -go to have a
> personal backup and one I could give out if needed.
> 
> I then ran xfs_repair -L /dev/mapper/loop1p3 which has not been making
> any progress for the last hour.
> I don't see much load in ways of CPU or i/o, yet there is literally no
> progress at all. I am hanging at:
> 
> 
> no . entry for directory 39756
> no .. entry for directory 39756
> problem with directory contents in inode 39756
> cleared inode 39756
> data fork in ino 39771 claims free block 8391321
> bad directory block magic # 0x58443242 in block 0 for directory inode 39771
> corrupt block 0 in directory inode 39771
>         will junk block
> no . entry for directory 39771
> no .. entry for directory 39771
> problem with directory contents in inode 39771
> cleared inode 39771
> data fork in ino 39774 claims free block 44122
> data fork in ino 39774 claims free block 8432730
> bad directory block magic # 0x1f1f1f1f in block 0 for directory inode 39774
> corrupt block 0 in directory inode 39774
>         will junk block
> no . entry for directory 39774
> no .. entry for directory 39774
> problem with directory contents in inode 39774
> cleared inode 39774
> multiple . entries in directory inode 39816: clearing entry
> multiple .. entries in directory inode 39816: clearing entry
> bad directory block magic # 0x58443244 in block 0 for directory inode 39840
> corrupt block 0 in directory inode 39840
>         will junk block
> no . entry for directory 39840
> no .. entry for directory 39840
> problem with directory contents in inode 39840
> cleared inode 39840
> bad . entry in directory inode 39909, was 45188: correcting
> data fork in ino 40102 claims free block 8400946
> ab3e1b90: Badness in key lookup (length)
> bp=(bno 258176, len 8192 bytes) key=(bno 258176, len 4096 bytes)
> ab3e1b90: Badness in key lookup (length)
> bp=(bno 258224, len 8192 bytes) key=(bno 258224, len 4096 bytes)
> 
> 
> Is there any way to read out the current state of xfs_repair and tell
> what it is doing and when it will finish? I am aware that the end time
> is impossible to predict, I am just looking for ballpark figures.

to see if it's moving, I'd try stracing the process or examining it
with gdb if you are able.

You didn't say which version of xfsprogs you have; IIRC at least one known
problem with hanging has been fixed upstream.

-Eric

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: xfs_repair on a 1.5 TiB image has been hanging for about an hour, now
  2010-02-12 15:28 ` Eric Sandeen
@ 2010-02-12 16:45   ` Richard Hartmann
  2010-02-12 17:02     ` Richard Hartmann
  0 siblings, 1 reply; 18+ messages in thread
From: Richard Hartmann @ 2010-02-12 16:45 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: xfs

On Fri, Feb 12, 2010 at 16:28, Eric Sandeen <sandeen@sandeen.net> wrote:

> to see if it's moving, I'd try stracing the process or examining it
> with gdb if you are able.

I thought about that, but I fear that I will flood my terminal off with crap
and/or that I somehow impair xfs_repair while doing so.


> You didn't say which version of xfsprogs you have; IIRC at least one known
> problem with hanging has been fixed upstream.

root@grml ~ # xfs_repair -V
xfs_repair version 3.0.4
root@grml ~ #

I seriously do hope that I am not affected...


Thanks,
Richard

PS: If you need any other debug info, please tell me

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: xfs_repair on a 1.5 TiB image has been hanging for about an hour, now
  2010-02-12 16:45   ` Richard Hartmann
@ 2010-02-12 17:02     ` Richard Hartmann
  2010-02-12 17:31       ` Nicolas Stransky
  0 siblings, 1 reply; 18+ messages in thread
From: Richard Hartmann @ 2010-02-12 17:02 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: xfs

On Fri, Feb 12, 2010 at 17:45, Richard Hartmann
<richih.mailinglist@gmail.com> wrote:

> I thought about that, but I fear that I will flood my terminal off with crap
> and/or that I somehow impair xfs_repair while doing so.

Running it for ten minutes gives me:

root@grml ~ # strace -p13629
Process 13629 attached - interrupt to quit
futex(0xa381b4cc, FUTEX_WAIT_PRIVATE, 2, NULL^C <unfinished ...>
Process 13629 detached
root@grml ~ #


Richard

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: xfs_repair on a 1.5 TiB image has been hanging for about an hour, now
  2010-02-12 17:02     ` Richard Hartmann
@ 2010-02-12 17:31       ` Nicolas Stransky
  2010-02-12 17:50         ` Eric Sandeen
  0 siblings, 1 reply; 18+ messages in thread
From: Nicolas Stransky @ 2010-02-12 17:31 UTC (permalink / raw)
  To: linux-xfs

I'm running into the same problem exactly, except that it's not 10
minutes, but DAYS.

What is the version that fixes this?

Thanks

On 2/12/10 12:02 PM, Richard Hartmann wrote:
> On Fri, Feb 12, 2010 at 17:45, Richard Hartmann
> <richih.mailinglist@gmail.com> wrote:
> 
>> I thought about that, but I fear that I will flood my terminal off with crap
>> and/or that I somehow impair xfs_repair while doing so.
> 
> Running it for ten minutes gives me:
> 
> root@grml ~ # strace -p13629
> Process 13629 attached - interrupt to quit
> futex(0xa381b4cc, FUTEX_WAIT_PRIVATE, 2, NULL^C <unfinished ...>
> Process 13629 detached
> root@grml ~ #


-- 
Nico

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: xfs_repair on a 1.5 TiB image has been hanging for about an hour, now
  2010-02-12 17:31       ` Nicolas Stransky
@ 2010-02-12 17:50         ` Eric Sandeen
  2010-02-12 18:16           ` Nicolas Stransky
  2010-02-12 20:01           ` Richard Hartmann
  0 siblings, 2 replies; 18+ messages in thread
From: Eric Sandeen @ 2010-02-12 17:50 UTC (permalink / raw)
  To: Nicolas Stransky; +Cc: linux-xfs

Nicolas Stransky wrote:
> I'm running into the same problem exactly, except that it's not 10
> minutes, but DAYS.
> 
> What is the version that fixes this?

hard to say without knowing for sure what version you're using, and
what exactly "this" is that you're seeing :)

Providing an xfs_metadump of the corrupted fs that hangs repair
is also about the best thing you could do for investigation,
if you've already determined that the latest release doesn't help.

-Eric

> Thanks
> 
> On 2/12/10 12:02 PM, Richard Hartmann wrote:
>> On Fri, Feb 12, 2010 at 17:45, Richard Hartmann
>> <richih.mailinglist@gmail.com> wrote:
>>
>>> I thought about that, but I fear that I will flood my terminal off with crap
>>> and/or that I somehow impair xfs_repair while doing so.
>> Running it for ten minutes gives me:
>>
>> root@grml ~ # strace -p13629
>> Process 13629 attached - interrupt to quit
>> futex(0xa381b4cc, FUTEX_WAIT_PRIVATE, 2, NULL^C <unfinished ...>
>> Process 13629 detached
>> root@grml ~ #
> 
> 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: xfs_repair on a 1.5 TiB image has been hanging for about an hour, now
  2010-02-12 17:50         ` Eric Sandeen
@ 2010-02-12 18:16           ` Nicolas Stransky
  2010-02-12 20:30             ` Nicolas Stransky
  2010-02-12 20:01           ` Richard Hartmann
  1 sibling, 1 reply; 18+ messages in thread
From: Nicolas Stransky @ 2010-02-12 18:16 UTC (permalink / raw)
  To: linux-xfs

Right :)
I'm using xfsprogs 3.1.0, on Debian Lenny, because 3.1.1 fails to build
for some reason.
I've been trying to repair a 1.4TB filesystem for more than a week,
without success. xfs_repair never completes. I straced the stuck process
today and got:
Process 28510 attached - interrupt to quit
futex(0x429cb58, FUTEX_WAIT_PRIVATE, 2, NULL

I'm running xfs_repair -P -t5 -m500 /dev/sda1 because the machine only
has 2GB of RAM and was swapping like crazy. I'll see how it goes with
the -P option.

I'm happy to provide xfs_metadump if this still hangs!

Thanks,
    Nico

On 2/12/10 12:50 PM, Eric Sandeen wrote:
> Nicolas Stransky wrote:
>> I'm running into the same problem exactly, except that it's not 10
>> minutes, but DAYS.
>>
>> What is the version that fixes this?
> 
> hard to say without knowing for sure what version you're using, and
> what exactly "this" is that you're seeing :)
> 
> Providing an xfs_metadump of the corrupted fs that hangs repair
> is also about the best thing you could do for investigation,
> if you've already determined that the latest release doesn't help.
> 
> -Eric
> 
>> Thanks
>>
>> On 2/12/10 12:02 PM, Richard Hartmann wrote:
>>> On Fri, Feb 12, 2010 at 17:45, Richard Hartmann
>>> <richih.mailinglist@gmail.com> wrote:
>>>
>>>> I thought about that, but I fear that I will flood my terminal off with crap
>>>> and/or that I somehow impair xfs_repair while doing so.
>>> Running it for ten minutes gives me:
>>>
>>> root@grml ~ # strace -p13629
>>> Process 13629 attached - interrupt to quit
>>> futex(0xa381b4cc, FUTEX_WAIT_PRIVATE, 2, NULL^C <unfinished ...>
>>> Process 13629 detached
>>> root@grml ~ #
>>
>>
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
> 


-- 
Nico

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: xfs_repair on a 1.5 TiB image has been hanging for about an hour, now
  2010-02-12 18:16           ` Nicolas Stransky
@ 2010-02-12 20:30             ` Nicolas Stransky
  2010-02-12 20:39               ` Richard Hartmann
  2010-02-12 20:48               ` Eric Sandeen
  0 siblings, 2 replies; 18+ messages in thread
From: Nicolas Stransky @ 2010-02-12 20:30 UTC (permalink / raw)
  To: linux-xfs

The -P option allowed me to complete xfs_repair successfully, but it
also deleted almost everything on the filesystem... A directory that I
could mount -o ro,norecovery before, with tens of GB of data in it, is
now empty... That is really too bad.

Thanks for your help.

On 2/12/10 1:16 PM, Nicolas Stransky wrote:
> Right :)
> I'm using xfsprogs 3.1.0, on Debian Lenny, because 3.1.1 fails to build
> for some reason.
> I've been trying to repair a 1.4TB filesystem for more than a week,
> without success. xfs_repair never completes. I straced the stuck process
> today and got:
> Process 28510 attached - interrupt to quit
> futex(0x429cb58, FUTEX_WAIT_PRIVATE, 2, NULL
> 
> I'm running xfs_repair -P -t5 -m500 /dev/sda1 because the machine only
> has 2GB of RAM and was swapping like crazy. I'll see how it goes with
> the -P option.
> 
> I'm happy to provide xfs_metadump if this still hangs!
> 
> Thanks,
>     Nico
> 
> On 2/12/10 12:50 PM, Eric Sandeen wrote:
>> Nicolas Stransky wrote:
>>> I'm running into the same problem exactly, except that it's not 10
>>> minutes, but DAYS.
>>>
>>> What is the version that fixes this?
>>
>> hard to say without knowing for sure what version you're using, and
>> what exactly "this" is that you're seeing :)
>>
>> Providing an xfs_metadump of the corrupted fs that hangs repair
>> is also about the best thing you could do for investigation,
>> if you've already determined that the latest release doesn't help.
>>
>> -Eric
>>
>>> Thanks
>>>
>>> On 2/12/10 12:02 PM, Richard Hartmann wrote:
>>>> On Fri, Feb 12, 2010 at 17:45, Richard Hartmann
>>>> <richih.mailinglist@gmail.com> wrote:
>>>>
>>>>> I thought about that, but I fear that I will flood my terminal off with crap
>>>>> and/or that I somehow impair xfs_repair while doing so.
>>>> Running it for ten minutes gives me:
>>>>
>>>> root@grml ~ # strace -p13629
>>>> Process 13629 attached - interrupt to quit
>>>> futex(0xa381b4cc, FUTEX_WAIT_PRIVATE, 2, NULL^C <unfinished ...>
>>>> Process 13629 detached
>>>> root@grml ~ #
>>>
>>>
>>
>> _______________________________________________
>> xfs mailing list
>> xfs@oss.sgi.com
>> http://oss.sgi.com/mailman/listinfo/xfs
>>
> 
> 


-- 
Nico

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: xfs_repair on a 1.5 TiB image has been hanging for about an hour, now
  2010-02-12 20:30             ` Nicolas Stransky
@ 2010-02-12 20:39               ` Richard Hartmann
  2010-02-12 22:21                 ` Nicolas Stransky
  2010-02-12 20:48               ` Eric Sandeen
  1 sibling, 1 reply; 18+ messages in thread
From: Richard Hartmann @ 2010-02-12 20:39 UTC (permalink / raw)
  To: Nicolas Stransky; +Cc: linux-xfs

On Fri, Feb 12, 2010 at 21:30, Nicolas Stransky <nico@stransky.cx> wrote:
> The -P option allowed me to complete xfs_repair successfully, but it
> also deleted almost everything on the filesystem... A directory that I
> could mount -o ro,norecovery before, with tens of GB of data in it, is
> now empty... That is really too bad.

Do you still have a backup of the raw disk?


Richard

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: xfs_repair on a 1.5 TiB image has been hanging for about an hour, now
  2010-02-12 20:39               ` Richard Hartmann
@ 2010-02-12 22:21                 ` Nicolas Stransky
  0 siblings, 0 replies; 18+ messages in thread
From: Nicolas Stransky @ 2010-02-12 22:21 UTC (permalink / raw)
  To: linux-xfs

On 2/12/10 3:39 PM, Richard Hartmann wrote:
> On Fri, Feb 12, 2010 at 21:30, Nicolas Stransky <nico@stransky.cx> wrote:
>> The -P option allowed me to complete xfs_repair successfully, but it
>> also deleted almost everything on the filesystem... A directory that I
>> could mount -o ro,norecovery before, with tens of GB of data in it, is
>> now empty... That is really too bad.
> 
> Do you still have a backup of the raw disk?

Unfortunately I don't. I think it's just lost now.

-- 
Nico

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: xfs_repair on a 1.5 TiB image has been hanging for about an hour, now
  2010-02-12 20:30             ` Nicolas Stransky
  2010-02-12 20:39               ` Richard Hartmann
@ 2010-02-12 20:48               ` Eric Sandeen
  2010-02-12 22:27                 ` Nicolas Stransky
  1 sibling, 1 reply; 18+ messages in thread
From: Eric Sandeen @ 2010-02-12 20:48 UTC (permalink / raw)
  To: Nicolas Stransky; +Cc: linux-xfs

Nicolas Stransky wrote:
> The -P option allowed me to complete xfs_repair successfully, but it
> also deleted almost everything on the filesystem... A directory that I
> could mount -o ro,norecovery before, with tens of GB of data in it, is
> now empty... That is really too bad.

Any idea what happened to the fs - what prompted the repair?

Did you save the logs, or a preliminary metadump?

I wonder what it found.  The files aren't in lost+found/ ?

 
> Thanks for your help.

Such as it was, I guess...

-Eric

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: xfs_repair on a 1.5 TiB image has been hanging for about an hour, now
  2010-02-12 20:48               ` Eric Sandeen
@ 2010-02-12 22:27                 ` Nicolas Stransky
  0 siblings, 0 replies; 18+ messages in thread
From: Nicolas Stransky @ 2010-02-12 22:27 UTC (permalink / raw)
  To: linux-xfs

On 2/12/10 3:48 PM, Eric Sandeen wrote:
> Nicolas Stransky wrote:
>> The -P option allowed me to complete xfs_repair successfully, but it
>> also deleted almost everything on the filesystem... A directory that I
>> could mount -o ro,norecovery before, with tens of GB of data in it, is
>> now empty... That is really too bad.
> 
> Any idea what happened to the fs - what prompted the repair?

Yes, it's a RAID 5 array that failed pretty bad due to multiple reboots
of the machine. Some disks failed and at some point, the RAID card did
stopped to detect some of the disks, or failed to assemble the RAID
array correctly. 3ware (LSI) issued a fix that allowed to start the
array in degraded mode, but the xfs filesystem seemed in a pretty bad
shape at that point. xfs_repair was failing right at the beginning and I
had to xfs_repair -L. Then xfs_repair was hanging forever. It's only
today, by using -P, that I got it to complete.

> Did you save the logs, or a preliminary metadump?

No I did not.

> I wonder what it found.  The files aren't in lost+found/ ?

There are tons of gigs of files in lost+found but I can't really sort
them and figure out which one is which at this point...

Thanks,
-- 
Nico

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: xfs_repair on a 1.5 TiB image has been hanging for about an hour, now
  2010-02-12 17:50         ` Eric Sandeen
  2010-02-12 18:16           ` Nicolas Stransky
@ 2010-02-12 20:01           ` Richard Hartmann
  2010-02-12 20:11             ` Eric Sandeen
  2010-02-12 22:19             ` Eric Sandeen
  1 sibling, 2 replies; 18+ messages in thread
From: Richard Hartmann @ 2010-02-12 20:01 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: linux-xfs, Nicolas Stransky

On Fri, Feb 12, 2010 at 18:50, Eric Sandeen <sandeen@sandeen.net> wrote:

> hard to say without knowing for sure what version you're using, and
> what exactly "this" is that you're seeing :)

3.0.4 - I stated that in another subthread so it might have gotten lost.


> Providing an xfs_metadump of the corrupted fs that hangs repair
> is also about the best thing you could do for investigation,
> if you've already determined that the latest release doesn't help.

http://dediserver.eu/misc/mailstore_metadata_obscured__after_xfs_repair_hang.bz2
http://dediserver.eu/misc/mailstore_metadata_obscured.bz2

These logs will stay up for at least a week or three.


Richard

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: xfs_repair on a 1.5 TiB image has been hanging for about an hour, now
  2010-02-12 20:01           ` Richard Hartmann
@ 2010-02-12 20:11             ` Eric Sandeen
  2010-02-12 22:19             ` Eric Sandeen
  1 sibling, 0 replies; 18+ messages in thread
From: Eric Sandeen @ 2010-02-12 20:11 UTC (permalink / raw)
  To: Richard Hartmann; +Cc: linux-xfs, Nicolas Stransky

Richard Hartmann wrote:
> On Fri, Feb 12, 2010 at 18:50, Eric Sandeen <sandeen@sandeen.net> wrote:
> 
>> hard to say without knowing for sure what version you're using, and
>> what exactly "this" is that you're seeing :)
> 
> 3.0.4 - I stated that in another subthread so it might have gotten lost.

That question was for Nicolas ;)

-Eric

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: xfs_repair on a 1.5 TiB image has been hanging for about an hour, now
  2010-02-12 20:01           ` Richard Hartmann
  2010-02-12 20:11             ` Eric Sandeen
@ 2010-02-12 22:19             ` Eric Sandeen
  1 sibling, 0 replies; 18+ messages in thread
From: Eric Sandeen @ 2010-02-12 22:19 UTC (permalink / raw)
  To: Richard Hartmann; +Cc: linux-xfs, Nicolas Stransky

Richard Hartmann wrote:
> On Fri, Feb 12, 2010 at 18:50, Eric Sandeen <sandeen@sandeen.net> wrote:
> 
>> hard to say without knowing for sure what version you're using, and
>> what exactly "this" is that you're seeing :)
> 
> 3.0.4 - I stated that in another subthread so it might have gotten lost.
> 
> 
>> Providing an xfs_metadump of the corrupted fs that hangs repair
>> is also about the best thing you could do for investigation,
>> if you've already determined that the latest release doesn't help.
> 
> http://dediserver.eu/misc/mailstore_metadata_obscured__after_xfs_repair_hang.bz2
> http://dediserver.eu/misc/mailstore_metadata_obscured.bz2
> 
> These logs will stay up for at least a week or three.
> 

Ok it's hung in here it seems:

(gdb) bt
#0  0x0000003df2e0ce74 in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x0000003df2e08874 in _L_lock_106 () from /lib64/libpthread.so.0
#2  0x0000003df2e082e0 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x00000000004310d9 in libxfs_getbuf (device=<value optimized out>, blkno=<value optimized out>, len=<value optimized out>)
    at rdwr.c:394
#4  0x000000000043110d in libxfs_readbuf (dev=140518781147480, blkno=128, len=-220135752, flags=-1) at rdwr.c:483
#5  0x0000000000413d94 in da_read_buf (mp=0x7fff54dbcb70, nex=1, bmp=<value optimized out>) at dir2.c:110
#6  0x0000000000415b30 in process_block_dir2 (mp=0x7fff54dbcb70, ino=128, dip=0x7fcd14080e00, ino_discovery=1, 
    dino_dirty=<value optimized out>, dirname=0x464619 "", parent=0x7fff54dbca10, blkmap=0x1c19dd0, dot=0x7fff54dbc6fc, 
    dotdot=0x7fff54dbc6f8, repair=0x7fff54dbc6f4) at dir2.c:1697
#7  0x00000000004161ac in process_dir2 (mp=0x7fff54dbcb70, ino=128, dip=0x7fcd14080e00, ino_discovery=1, dino_dirty=0x7fff54dbca20, 
    dirname=0x464619 "", parent=0x7fff54dbca10, blkmap=0x1c19dd0) at dir2.c:2084
#8  0x000000000040e422 in process_dinode_int (mp=0x7fff54dbcb70, dino=0x7fcd14080e00, agno=0, ino=128, was_free=0, dirty=0x7fff54dbca20, 
    used=0x7fff54dbca24, verify_mode=0, uncertain=0, ino_discovery=1, check_dups=0, extra_attr_check=1, isa_dir=0x7fff54dbca1c, 
    parent=0x7fff54dbca10) at dinode.c:2661
#9  0x000000000040e79e in process_dinode (mp=0x7fcd1408d958, dino=0x80, agno=4074831544, ino=4294967295, was_free=28730568, 
    dirty=0x464619, used=0x7fff54dbca24, ino_discovery=1, check_dups=0, extra_attr_check=1, isa_dir=0x7fff54dbca1c, 
    parent=0x7fff54dbca10) at dinode.c:2772
#10 0x0000000000408483 in process_inode_chunk (mp=0x7fff54dbcb70, agno=0, num_inos=<value optimized out>, first_irec=0x1b77930, 
    ino_discovery=1, check_dups=0, extra_attr_check=1, bogus=0x7fff54dbcaa4) at dino_chunks.c:777
#11 0x0000000000408b22 in process_aginodes (mp=0x7fff54dbcb70, pf_args=0x361bae0, agno=0, ino_discovery=1, check_dups=0, 
    extra_attr_check=1) at dino_chunks.c:1024
#12 0x000000000041a4ef in process_ag_func (wq=0x1d65a00, agno=0, arg=0x361bae0) at phase3.c:154
#13 0x000000000041ab55 in phase3 (mp=0x7fff54dbcb70) at phase3.c:193
#14 0x000000000042d5a1 in main (argc=<value optimized out>, argv=<value optimized out>) at xfs_repair.c:712

And you're right, it's not progressing.

The filesystem is a real mess, but it's also making repair pretty unhappy :)

1st run hangs
2nd run completes with -P
next run resets more link counts
run after that segfaults

:(

And just a warning, post-repair about 22% of the files are in lost+found.

It'd take a bit of dedicated time to sort out the issues in repair here,
we need to do it but somebody's going to hav to find the time ...

-Eric

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 18+ messages in thread

[parent not found: <20100212165801.GA7323@puku.stupidest.org>]

* Re: xfs_repair on a 1.5 TiB image has been hanging for about an hour, now
       [not found] ` <20100212165801.GA7323@puku.stupidest.org>
@ 2010-02-12 17:04   ` Richard Hartmann
  2010-02-12 17:06     ` Richard Hartmann
  2010-02-12 17:13     ` Eric Sandeen
  0 siblings, 2 replies; 18+ messages in thread
From: Richard Hartmann @ 2010-02-12 17:04 UTC (permalink / raw)
  To: Chris Wedgwood; +Cc: xfs

On Fri, Feb 12, 2010 at 17:58, Chris Wedgwood <cw@f00f.org> wrote:

> try adding -P

Which leads me to the question if it would be save to just CTRL-C a
running xfs_repair (or if there is a different graceful shutdown function).

In any case, I would then retry with 3.1.0/3.1.1.


Thanks,
Richard

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: xfs_repair on a 1.5 TiB image has been hanging for about an hour, now
  2010-02-12 17:04   ` Richard Hartmann
@ 2010-02-12 17:06     ` Richard Hartmann
  2010-02-12 17:13     ` Eric Sandeen
  1 sibling, 0 replies; 18+ messages in thread
From: Richard Hartmann @ 2010-02-12 17:06 UTC (permalink / raw)
  To: Chris Wedgwood; +Cc: xfs

On Fri, Feb 12, 2010 at 18:04, Richard Hartmann
<richih.mailinglist@gmail.com> wrote:
> On Fri, Feb 12, 2010 at 17:58, Chris Wedgwood <cw@f00f.org> wrote:
>
>> try adding -P
>
> Which leads me to the question if it would be save to just CTRL-C a
> running xfs_repair (or if there is a different graceful shutdown function).

-P     Disable prefetching of inode and directory blocks. Use this
option if you find xfs_repair gets stuck and stops proceeding.
Interrupting a stuck xfs_repair is safe.


Well, OK :)


Thanks,
Richard

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: xfs_repair on a 1.5 TiB image has been hanging for about an hour, now
  2010-02-12 17:04   ` Richard Hartmann
  2010-02-12 17:06     ` Richard Hartmann
@ 2010-02-12 17:13     ` Eric Sandeen
  1 sibling, 0 replies; 18+ messages in thread
From: Eric Sandeen @ 2010-02-12 17:13 UTC (permalink / raw)
  To: Richard Hartmann; +Cc: Chris Wedgwood, xfs

Richard Hartmann wrote:
> On Fri, Feb 12, 2010 at 17:58, Chris Wedgwood <cw@f00f.org> wrote:
> 
>> try adding -P
> 
> Which leads me to the question if it would be save to just CTRL-C a
> running xfs_repair (or if there is a different graceful shutdown function).

It is safe, as the manpage says right there by the -P option ;)

-Eric

> In any case, I would then retry with 3.1.0/3.1.1.
> 
> 
> Thanks,
> Richard
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
> 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2010-02-12 22:26 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-02-12 14:07 xfs_repair on a 1.5 TiB image has been hanging for about an hour, now Richard Hartmann
2010-02-12 15:28 ` Eric Sandeen
2010-02-12 16:45   ` Richard Hartmann
2010-02-12 17:02     ` Richard Hartmann
2010-02-12 17:31       ` Nicolas Stransky
2010-02-12 17:50         ` Eric Sandeen
2010-02-12 18:16           ` Nicolas Stransky
2010-02-12 20:30             ` Nicolas Stransky
2010-02-12 20:39               ` Richard Hartmann
2010-02-12 22:21                 ` Nicolas Stransky
2010-02-12 20:48               ` Eric Sandeen
2010-02-12 22:27                 ` Nicolas Stransky
2010-02-12 20:01           ` Richard Hartmann
2010-02-12 20:11             ` Eric Sandeen
2010-02-12 22:19             ` Eric Sandeen
     [not found] ` <20100212165801.GA7323@puku.stupidest.org>
2010-02-12 17:04   ` Richard Hartmann
2010-02-12 17:06     ` Richard Hartmann
2010-02-12 17:13     ` Eric Sandeen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox