xfs_admin -c 1 + xfs_repair problem

public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed

* xfs_admin -c 1  + xfs_repair problem
@ 2008-04-28 18:30 Daniel Bast
  2008-04-29  0:48 ` Barry Naujok
  0 siblings, 1 reply; 4+ messages in thread
From: Daniel Bast @ 2008-04-28 18:30 UTC (permalink / raw)
  To: xfs

Hi,

i tried to enable lazy counts with "xfs_admin -c 1 device" with 
xfs_admin from xfsprogs 2.9.8. Unfortunately that process got stuck 
without any message. After several hours without any IO or CPU workload 
i killed the process and started xfs_repair, but that also got stuck (in 
"Phase 6") without any IO or CPU workload or any extra message. The 
xfs_repair being stuck in "Phase 6" is reproduceable with a 
metadump-image of the filesystem.

I was able to mount the device but don't want to use it because i'm not 
sure if everything is ok.

How can i resolve that problem? What information do you need? I can 
provide the metadump image (bzip compressed: 28MB) if necessary.

Here are some informations that are maybe useful:

  xfs_repair -v /dev/sda7
  Phase 1 - find and verify superblock...
          - block cache size set to 11472 entries
  Phase 2 - using internal log
          - zero log...
  zero_log: head block 2 tail block 2
          - scan filesystem freespace and inode maps...
          - found root inode chunk
  Phase 3 - for each AG...
          - scan and clear agi unlinked lists...
          - process known inodes and perform inode discovery...
          - agno = 0
          - agno = 1
          - agno = 2
          - agno = 3
          - process newly discovered inodes...
  Phase 4 - check for duplicate blocks...
          - setting up duplicate extent list...
          - check for inodes claiming duplicate blocks...
          - agno = 0
          - agno = 1
          - agno = 2
          - agno = 3
  Phase 5 - rebuild AG headers and trees...
          - agno = 0
          - agno = 1
          - agno = 2
          - agno = 3
          - reset superblock...
  Phase 6 - check inode connectivity...
          - resetting contents of realtime bitmap and summary inodes
          - traversing filesystem ...
          - agno = 0


after the killed xfs_admin -c 1 and xfs_repair processes:
xfs_info /dev/sda7
meta-data=/dev/sda7              isize=256    agcount=4, agsize=24719013 
blks
          =                       sectsz=512   attr=2
data     =                       bsize=4096   blocks=98876050, imaxpct=25
          =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096
log      =internal               bsize=4096   blocks=32768, version=2
          =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=65536  blocks=0, rtextents=0


a new 'xfs_repair -v /dev/sda7' straced:
strace -ff -p 6364
Process 6409 attached with 6 threads - interrupt to quit
[pid  6364] futex(0x851e2cc, FUTEX_WAIT, 2, NULL <unfinished ...>
[pid  6405] futex(0xb146e3d8, FUTEX_WAIT, 0, NULL <unfinished ...>
[pid  6406] futex(0xb146e358, FUTEX_WAIT, 1, NULL <unfinished ...>
[pid  6407] futex(0xb146e358, FUTEX_WAIT, 2, NULL <unfinished ...>
[pid  6408] futex(0xb146e358, FUTEX_WAIT, 3, NULL <unfinished ...>
[pid  6409] futex(0xb146e358, FUTEX_WAIT, 4, NULL <unfinished ...>
[pid  6406] <... futex resumed> )       = -1 EAGAIN (Resource 
temporarily unavailable)
[pid  6407] <... futex resumed> )       = -1 EAGAIN (Resource 
temporarily unavailable)
[pid  6408] <... futex resumed> )       = -1 EAGAIN (Resource 
temporarily unavailable)
[pid  6406] futex(0xb146e358, FUTEX_WAIT, 4, NULL <unfinished ...>
[pid  6407] futex(0xb146e358, FUTEX_WAIT, 4, NULL <unfinished ...>
[pid  6408] futex(0xb146e358, FUTEX_WAIT, 4, NULL


Thanks
  Daniel

P.S. Please CC me, because i'm not subscribed to the list.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: xfs_admin -c 1  + xfs_repair problem
  2008-04-28 18:30 xfs_admin -c 1 + xfs_repair problem Daniel Bast
@ 2008-04-29  0:48 ` Barry Naujok
  2008-04-29  6:34   ` Daniel Bast
  0 siblings, 1 reply; 4+ messages in thread
From: Barry Naujok @ 2008-04-29  0:48 UTC (permalink / raw)
  To: Daniel Bast, xfs

On Tue, 29 Apr 2008 04:30:56 +1000, Daniel Bast <daniel.bast@gmx.net>  
wrote:

> Hi,
>
> i tried to enable lazy counts with "xfs_admin -c 1 device" with  
> xfs_admin from xfsprogs 2.9.8. Unfortunately that process got stuck  
> without any message. After several hours without any IO or CPU workload  
> i killed the process and started xfs_repair, but that also got stuck (in  
> "Phase 6") without any IO or CPU workload or any extra message. The  
> xfs_repair being stuck in "Phase 6" is reproduceable with a  
> metadump-image of the filesystem.
>
> I was able to mount the device but don't want to use it because i'm not  
> sure if everything is ok.

"xfs_admin -c 1" internally runs xfs_repair and hence why it got stuck
too. Your filesystems is fine, the only changes that occured for enabling
lazy-counters was in Phase 5, but may not have been written to disk.

> How can i resolve that problem? What information do you need? I can  
> provide the metadump image (bzip compressed: 28MB) if necessary.

Run xfs_repair -P <device> to disable prefetch.

The metadump would be very useful in finding out why xfs_repair got stuck.

Regards,
Barry.

> Here are some informations that are maybe useful:
>
>   xfs_repair -v /dev/sda7
>   Phase 1 - find and verify superblock...
>           - block cache size set to 11472 entries
>   Phase 2 - using internal log
>           - zero log...
>   zero_log: head block 2 tail block 2
>           - scan filesystem freespace and inode maps...
>           - found root inode chunk
>   Phase 3 - for each AG...
>           - scan and clear agi unlinked lists...
>           - process known inodes and perform inode discovery...
>           - agno = 0
>           - agno = 1
>           - agno = 2
>           - agno = 3
>           - process newly discovered inodes...
>   Phase 4 - check for duplicate blocks...
>           - setting up duplicate extent list...
>           - check for inodes claiming duplicate blocks...
>           - agno = 0
>           - agno = 1
>           - agno = 2
>           - agno = 3
>   Phase 5 - rebuild AG headers and trees...
>           - agno = 0
>           - agno = 1
>           - agno = 2
>           - agno = 3
>           - reset superblock...
>   Phase 6 - check inode connectivity...
>           - resetting contents of realtime bitmap and summary inodes
>           - traversing filesystem ...
>           - agno = 0
>
>
> after the killed xfs_admin -c 1 and xfs_repair processes:
> xfs_info /dev/sda7
> meta-data=/dev/sda7              isize=256    agcount=4, agsize=24719013  
> blks
>           =                       sectsz=512   attr=2
> data     =                       bsize=4096   blocks=98876050, imaxpct=25
>           =                       sunit=0      swidth=0 blks
> naming   =version 2              bsize=4096
> log      =internal               bsize=4096   blocks=32768, version=2
>           =                       sectsz=512   sunit=0 blks, lazy-count=1
> realtime =none                   extsz=65536  blocks=0, rtextents=0
>
>
> a new 'xfs_repair -v /dev/sda7' straced:
> strace -ff -p 6364
> Process 6409 attached with 6 threads - interrupt to quit
> [pid  6364] futex(0x851e2cc, FUTEX_WAIT, 2, NULL <unfinished ...>
> [pid  6405] futex(0xb146e3d8, FUTEX_WAIT, 0, NULL <unfinished ...>
> [pid  6406] futex(0xb146e358, FUTEX_WAIT, 1, NULL <unfinished ...>
> [pid  6407] futex(0xb146e358, FUTEX_WAIT, 2, NULL <unfinished ...>
> [pid  6408] futex(0xb146e358, FUTEX_WAIT, 3, NULL <unfinished ...>
> [pid  6409] futex(0xb146e358, FUTEX_WAIT, 4, NULL <unfinished ...>
> [pid  6406] <... futex resumed> )       = -1 EAGAIN (Resource  
> temporarily unavailable)
> [pid  6407] <... futex resumed> )       = -1 EAGAIN (Resource  
> temporarily unavailable)
> [pid  6408] <... futex resumed> )       = -1 EAGAIN (Resource  
> temporarily unavailable)
> [pid  6406] futex(0xb146e358, FUTEX_WAIT, 4, NULL <unfinished ...>
> [pid  6407] futex(0xb146e358, FUTEX_WAIT, 4, NULL <unfinished ...>
> [pid  6408] futex(0xb146e358, FUTEX_WAIT, 4, NULL
>
>
> Thanks
>   Daniel
>
> P.S. Please CC me, because i'm not subscribed to the list.
>
>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: xfs_admin -c 1  + xfs_repair problem
  2008-04-29  0:48 ` Barry Naujok
@ 2008-04-29  6:34   ` Daniel Bast
  2008-04-29  6:48     ` Barry Naujok
  0 siblings, 1 reply; 4+ messages in thread
From: Daniel Bast @ 2008-04-29  6:34 UTC (permalink / raw)
  To: xfs

Hi Barry,

'xfs_repair -P device' ran through and finished without any problem. So 
everything should be fine?
Or should I also run something like 'xfs_repair -P -c lazy-counts=1 
device' to make sure that one lazy-count-enable command got through?

After one '-P' run another one without '-P' doesn't finish so I'll send 
you the metadump later after finding out how to send a 28MB eMail 
attachment.

Thanks
  Daniel




Barry Naujok schrieb:
> On Tue, 29 Apr 2008 04:30:56 +1000, Daniel Bast <daniel.bast@gmx.net> 
> wrote:
> 
>> Hi,
>>
>> i tried to enable lazy counts with "xfs_admin -c 1 device" with 
>> xfs_admin from xfsprogs 2.9.8. Unfortunately that process got stuck 
>> without any message. After several hours without any IO or CPU 
>> workload i killed the process and started xfs_repair, but that also 
>> got stuck (in "Phase 6") without any IO or CPU workload or any extra 
>> message. The xfs_repair being stuck in "Phase 6" is reproduceable with 
>> a metadump-image of the filesystem.
>>
>> I was able to mount the device but don't want to use it because i'm 
>> not sure if everything is ok.
> 
> "xfs_admin -c 1" internally runs xfs_repair and hence why it got stuck
> too. Your filesystems is fine, the only changes that occured for enabling
> lazy-counters was in Phase 5, but may not have been written to disk.
> 
>> How can i resolve that problem? What information do you need? I can 
>> provide the metadump image (bzip compressed: 28MB) if necessary.
> 
> Run xfs_repair -P <device> to disable prefetch.
> 
> The metadump would be very useful in finding out why xfs_repair got stuck.
> 
> Regards,
> Barry.
> 
>> Here are some informations that are maybe useful:
>>
>>   xfs_repair -v /dev/sda7
>>   Phase 1 - find and verify superblock...
>>           - block cache size set to 11472 entries
>>   Phase 2 - using internal log
>>           - zero log...
>>   zero_log: head block 2 tail block 2
>>           - scan filesystem freespace and inode maps...
>>           - found root inode chunk
>>   Phase 3 - for each AG...
>>           - scan and clear agi unlinked lists...
>>           - process known inodes and perform inode discovery...
>>           - agno = 0
>>           - agno = 1
>>           - agno = 2
>>           - agno = 3
>>           - process newly discovered inodes...
>>   Phase 4 - check for duplicate blocks...
>>           - setting up duplicate extent list...
>>           - check for inodes claiming duplicate blocks...
>>           - agno = 0
>>           - agno = 1
>>           - agno = 2
>>           - agno = 3
>>   Phase 5 - rebuild AG headers and trees...
>>           - agno = 0
>>           - agno = 1
>>           - agno = 2
>>           - agno = 3
>>           - reset superblock...
>>   Phase 6 - check inode connectivity...
>>           - resetting contents of realtime bitmap and summary inodes
>>           - traversing filesystem ...
>>           - agno = 0
>>
>>
>> after the killed xfs_admin -c 1 and xfs_repair processes:
>> xfs_info /dev/sda7
>> meta-data=/dev/sda7              isize=256    agcount=4, 
>> agsize=24719013 blks
>>           =                       sectsz=512   attr=2
>> data     =                       bsize=4096   blocks=98876050, imaxpct=25
>>           =                       sunit=0      swidth=0 blks
>> naming   =version 2              bsize=4096
>> log      =internal               bsize=4096   blocks=32768, version=2
>>           =                       sectsz=512   sunit=0 blks, lazy-count=1
>> realtime =none                   extsz=65536  blocks=0, rtextents=0
>>
>>
>> a new 'xfs_repair -v /dev/sda7' straced:
>> strace -ff -p 6364
>> Process 6409 attached with 6 threads - interrupt to quit
>> [pid  6364] futex(0x851e2cc, FUTEX_WAIT, 2, NULL <unfinished ...>
>> [pid  6405] futex(0xb146e3d8, FUTEX_WAIT, 0, NULL <unfinished ...>
>> [pid  6406] futex(0xb146e358, FUTEX_WAIT, 1, NULL <unfinished ...>
>> [pid  6407] futex(0xb146e358, FUTEX_WAIT, 2, NULL <unfinished ...>
>> [pid  6408] futex(0xb146e358, FUTEX_WAIT, 3, NULL <unfinished ...>
>> [pid  6409] futex(0xb146e358, FUTEX_WAIT, 4, NULL <unfinished ...>
>> [pid  6406] <... futex resumed> )       = -1 EAGAIN (Resource 
>> temporarily unavailable)
>> [pid  6407] <... futex resumed> )       = -1 EAGAIN (Resource 
>> temporarily unavailable)
>> [pid  6408] <... futex resumed> )       = -1 EAGAIN (Resource 
>> temporarily unavailable)
>> [pid  6406] futex(0xb146e358, FUTEX_WAIT, 4, NULL <unfinished ...>
>> [pid  6407] futex(0xb146e358, FUTEX_WAIT, 4, NULL <unfinished ...>
>> [pid  6408] futex(0xb146e358, FUTEX_WAIT, 4, NULL
>>
>>
>> Thanks
>>   Daniel
>>
>> P.S. Please CC me, because i'm not subscribed to the list.
>>
>>
> 
> 

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: xfs_admin -c 1  + xfs_repair problem
  2008-04-29  6:34   ` Daniel Bast
@ 2008-04-29  6:48     ` Barry Naujok
  0 siblings, 0 replies; 4+ messages in thread
From: Barry Naujok @ 2008-04-29  6:48 UTC (permalink / raw)
  To: Daniel Bast; +Cc: xfs@oss.sgi.com

On Tue, 29 Apr 2008 16:34:29 +1000, Daniel Bast <daniel.bast@gmx.net>  
wrote:

> Hi Barry,
>
> 'xfs_repair -P device' ran through and finished without any problem. So  
> everything should be fine?
> Or should I also run something like 'xfs_repair -P -c lazy-counts=1  
> device' to make sure that one lazy-count-enable command got through?

Once mounted (yes, everything is fine), xfs_info will tell you if
lazy-counters was enabled.

If it didn't, xfs_repair -P -c lazycount=1 <device> will do it.

> After one '-P' run another one without '-P' doesn't finish so I'll send  
> you the metadump later after finding out how to send a 28MB eMail  
> attachment.

Email is a bit big. Some form of place where you can stash a binary
file would be good (ftp server, or some web site like yousendit).
You can email me privately with the details.

Regards,
Barry.

> Thanks
>   Daniel
>
>
>
>
> Barry Naujok schrieb:
>> On Tue, 29 Apr 2008 04:30:56 +1000, Daniel Bast <daniel.bast@gmx.net>  
>> wrote:
>>
>>> Hi,
>>>
>>> i tried to enable lazy counts with "xfs_admin -c 1 device" with  
>>> xfs_admin from xfsprogs 2.9.8. Unfortunately that process got stuck  
>>> without any message. After several hours without any IO or CPU  
>>> workload i killed the process and started xfs_repair, but that also  
>>> got stuck (in "Phase 6") without any IO or CPU workload or any extra  
>>> message. The xfs_repair being stuck in "Phase 6" is reproduceable with  
>>> a metadump-image of the filesystem.
>>>
>>> I was able to mount the device but don't want to use it because i'm  
>>> not sure if everything is ok.
>>  "xfs_admin -c 1" internally runs xfs_repair and hence why it got stuck
>> too. Your filesystems is fine, the only changes that occured for  
>> enabling
>> lazy-counters was in Phase 5, but may not have been written to disk.
>>
>>> How can i resolve that problem? What information do you need? I can  
>>> provide the metadump image (bzip compressed: 28MB) if necessary.
>>  Run xfs_repair -P <device> to disable prefetch.
>>  The metadump would be very useful in finding out why xfs_repair got  
>> stuck.
>>  Regards,
>> Barry.
>>
>>> Here are some informations that are maybe useful:
>>>
>>>   xfs_repair -v /dev/sda7
>>>   Phase 1 - find and verify superblock...
>>>           - block cache size set to 11472 entries
>>>   Phase 2 - using internal log
>>>           - zero log...
>>>   zero_log: head block 2 tail block 2
>>>           - scan filesystem freespace and inode maps...
>>>           - found root inode chunk
>>>   Phase 3 - for each AG...
>>>           - scan and clear agi unlinked lists...
>>>           - process known inodes and perform inode discovery...
>>>           - agno = 0
>>>           - agno = 1
>>>           - agno = 2
>>>           - agno = 3
>>>           - process newly discovered inodes...
>>>   Phase 4 - check for duplicate blocks...
>>>           - setting up duplicate extent list...
>>>           - check for inodes claiming duplicate blocks...
>>>           - agno = 0
>>>           - agno = 1
>>>           - agno = 2
>>>           - agno = 3
>>>   Phase 5 - rebuild AG headers and trees...
>>>           - agno = 0
>>>           - agno = 1
>>>           - agno = 2
>>>           - agno = 3
>>>           - reset superblock...
>>>   Phase 6 - check inode connectivity...
>>>           - resetting contents of realtime bitmap and summary inodes
>>>           - traversing filesystem ...
>>>           - agno = 0
>>>
>>>
>>> after the killed xfs_admin -c 1 and xfs_repair processes:
>>> xfs_info /dev/sda7
>>> meta-data=/dev/sda7              isize=256    agcount=4,  
>>> agsize=24719013 blks
>>>           =                       sectsz=512   attr=2
>>> data     =                       bsize=4096   blocks=98876050,  
>>> imaxpct=25
>>>           =                       sunit=0      swidth=0 blks
>>> naming   =version 2              bsize=4096
>>> log      =internal               bsize=4096   blocks=32768, version=2
>>>           =                       sectsz=512   sunit=0 blks,  
>>> lazy-count=1
>>> realtime =none                   extsz=65536  blocks=0, rtextents=0
>>>
>>>
>>> a new 'xfs_repair -v /dev/sda7' straced:
>>> strace -ff -p 6364
>>> Process 6409 attached with 6 threads - interrupt to quit
>>> [pid  6364] futex(0x851e2cc, FUTEX_WAIT, 2, NULL <unfinished ...>
>>> [pid  6405] futex(0xb146e3d8, FUTEX_WAIT, 0, NULL <unfinished ...>
>>> [pid  6406] futex(0xb146e358, FUTEX_WAIT, 1, NULL <unfinished ...>
>>> [pid  6407] futex(0xb146e358, FUTEX_WAIT, 2, NULL <unfinished ...>
>>> [pid  6408] futex(0xb146e358, FUTEX_WAIT, 3, NULL <unfinished ...>
>>> [pid  6409] futex(0xb146e358, FUTEX_WAIT, 4, NULL <unfinished ...>
>>> [pid  6406] <... futex resumed> )       = -1 EAGAIN (Resource  
>>> temporarily unavailable)
>>> [pid  6407] <... futex resumed> )       = -1 EAGAIN (Resource  
>>> temporarily unavailable)
>>> [pid  6408] <... futex resumed> )       = -1 EAGAIN (Resource  
>>> temporarily unavailable)
>>> [pid  6406] futex(0xb146e358, FUTEX_WAIT, 4, NULL <unfinished ...>
>>> [pid  6407] futex(0xb146e358, FUTEX_WAIT, 4, NULL <unfinished ...>
>>> [pid  6408] futex(0xb146e358, FUTEX_WAIT, 4, NULL
>>>
>>>
>>> Thanks
>>>   Daniel
>>>
>>> P.S. Please CC me, because i'm not subscribed to the list.
>>>
>>>
>>
>
>

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2008-04-29  6:46 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-04-28 18:30 xfs_admin -c 1 + xfs_repair problem Daniel Bast
2008-04-29  0:48 ` Barry Naujok
2008-04-29  6:34   ` Daniel Bast
2008-04-29  6:48     ` Barry Naujok

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox