* xfs corrupted
@ 2013-10-15 8:41 katmai
2013-10-15 18:34 ` Emmanuel Florac
0 siblings, 1 reply; 19+ messages in thread
From: katmai @ 2013-10-15 8:41 UTC (permalink / raw)
To: xfs
i guys,
i have a problem. yesterday there was a power outage at one of my
datacenters, where i have a relatively large fileserver. 2 arrays, 1 x 14 tb
and 1 x 18 tb both in raid6, with an adaptec card.
after the outage, the server came back online, the xfs partitions were
mounted, and everything looked okay. i could access the data and everything
seemed just fine.
today i woke up to lots of i/o errors, and when i rebooted the server, the
partitions would not mount:
Oct 14 04:09:17 kp4 kernel:
Oct 14 04:09:17 kp4 kernel: XFS internal error XFS_WANT_CORRUPTED_RETURN
a<ffffffff80056933>] pdflush+0x0/0x1fb
Oct 14 04:09:17 kp4 kernel: [<ffffffff80056a84>] pdflush+0x151/0x1fb
Oct 14 04:09:17 kp4 kernel: [<ffffffff800cd931>] wb_kupdate+0x0/0x16a
Oct 14 04:09:17 kp4 kernel: [<ffffffff80032c2b>] kthread+0xfe/0x132
Oct 14 04:09:17 kp4 kernel: [<ffffffff8005dfc1>] child_rip+0xa/0x11
Oct 14 04:09:17 kp4 kernel: [<ffffffff800a3ab7>]
keventd_create_kthread+0x0/0xc4
Oct 14 04:09:17 kp4 kernel: [<ffffffff80032b2d>] kthread+0x0/0x132
Oct 14 04:09:17 kp4 kernel: [<ffffffff8005dfb7>] child_rip+0x0/0x11
Oct 14 04:09:17 kp4 kernel:
Oct 14 04:09:17 kp4 kernel: XFS internal error XFS_WANT_CORRUPTED_RETURN at
line 279 of file fs/xfs/xfs_alloc.c. Caller 0xffffffff88342331
Oct 14 04:09:17 kp4 kernel:
got a bunch of these in dmesg.
i googled for solutions and i think i jumped the horse by doing xfs_repair
-L /dev/sdc. it would not clean it with xfs_repair /dev/sdc, and everybody
pretty much says the same thing.
this is what i was getting when trying to mount the array.
Filesystem Corruption of in-memory data detected. Shutting down filesystem
xfs_check
Did i jump the gun by using the -L switch :/ ?
--
View this message in context: http://xfs.9218.n7.nabble.com/xfs-corrupted-tp35009.html
Sent from the Xfs - General mailing list archive at Nabble.com.
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: xfs corrupted
2013-10-15 8:41 xfs corrupted katmai
@ 2013-10-15 18:34 ` Emmanuel Florac
2013-10-15 18:45 ` Stefanita Rares Dumitrescu
0 siblings, 1 reply; 19+ messages in thread
From: Emmanuel Florac @ 2013-10-15 18:34 UTC (permalink / raw)
To: katmai; +Cc: xfs
Le Tue, 15 Oct 2013 01:41:47 -0700 (PDT) vous écriviez:
> Did i jump the gun by using the -L switch :/ ?
You should have checked that the RAID is optimal first! In case of a
flailing hardware, any write to the volume can exacerbate problems.
You should use arcconf to check for the RAID state (arcconf getstatus
1) and eventually run a RAID repair (arcconf task start 1 logicaldrive
0 verify_fix).
--
------------------------------------------------------------------------
Emmanuel Florac | Direction technique
| Intellique
| <eflorac@intellique.com>
| +33 1 78 94 84 02
------------------------------------------------------------------------
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: xfs corrupted
2013-10-15 18:34 ` Emmanuel Florac
@ 2013-10-15 18:45 ` Stefanita Rares Dumitrescu
2013-10-15 19:07 ` Chris Murphy
2013-10-15 19:34 ` Emmanuel Florac
0 siblings, 2 replies; 19+ messages in thread
From: Stefanita Rares Dumitrescu @ 2013-10-15 18:45 UTC (permalink / raw)
To: Emmanuel Florac; +Cc: xfs
That was the first thing i checked: the array was optimal, and i checked
each drive with smartctl, and they are all fine.
I left the xfs_repair on for the night, and it showed no progress. I was
actually thinking that maybe the memory is bad, so i took the server
offline this morning, and ran a memtest for 3 hours, which showed
nothing wrong with the sticks, however good news:
I was able to mount the array, but i can only read from it. Whenever i
try to write something, it just hangs right there.
I ran an xfs_repair -n on the second array, which is 18 tb in size as
opposed to the 14 tb first one, and that check completed in like 10
minutes.
I am running now xfs_repair -n on the 14 tb bad array, and it's stuck
here for about 5 hours now.
[root@kp4 ~]# umount /home
[root@kp4 ~]# xfs_repair -n /dev/sdc
Phase 1 - find and verify superblock...
Phase 2 - using internal log
- scan filesystem freespace and inode maps...
- found root inode chunk
Phase 3 - for each AG...
- scan (but don't clear) agi unlinked lists...
- process known inodes and perform inode discovery...
- agno = 0
What worries me is that i see 100 % cpu usage, some 74 % memory usage (i
have 4 gb ram) but there is no disk activity at all. I was thinking that
it would be at least some reads if the xfs_repair is doing something.
On 15/10/2013 20:34, Emmanuel Florac wrote:
> Le Tue, 15 Oct 2013 01:41:47 -0700 (PDT) vous écriviez:
>
>> Did i jump the gun by using the -L switch :/ ?
>
> You should have checked that the RAID is optimal first! In case of a
> flailing hardware, any write to the volume can exacerbate problems.
>
> You should use arcconf to check for the RAID state (arcconf getstatus
> 1) and eventually run a RAID repair (arcconf task start 1 logicaldrive
> 0 verify_fix).
>
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: xfs corrupted
2013-10-15 18:45 ` Stefanita Rares Dumitrescu
@ 2013-10-15 19:07 ` Chris Murphy
2013-10-15 19:52 ` Emmanuel Florac
2013-10-15 19:34 ` Emmanuel Florac
1 sibling, 1 reply; 19+ messages in thread
From: Chris Murphy @ 2013-10-15 19:07 UTC (permalink / raw)
To: xfs@oss.sgi.com
On Oct 15, 2013, at 12:45 PM, Stefanita Rares Dumitrescu <katmai@keptprivate.com> wrote:
>
> What worries me is that i see 100 % cpu usage, some 74 % memory usage (i have 4 gb ram) but there is no disk activity at all. I was thinking that it would be at least some reads if the xfs_repair is doing something.
That is very low RAM for a system with two big arrays attached. So if repair finds it needs to repair something it's going to take a long time.
http://xfs.org/index.php/XFS_FAQ#Q:_Which_factors_influence_the_memory_usage_of_xfs_repair.3F
Chris Murphy
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: xfs corrupted
2013-10-15 18:45 ` Stefanita Rares Dumitrescu
2013-10-15 19:07 ` Chris Murphy
@ 2013-10-15 19:34 ` Emmanuel Florac
2013-10-15 19:57 ` Stefanita Rares Dumitrescu
2013-10-15 20:02 ` Stefanita Rares Dumitrescu
1 sibling, 2 replies; 19+ messages in thread
From: Emmanuel Florac @ 2013-10-15 19:34 UTC (permalink / raw)
To: Stefanita Rares Dumitrescu; +Cc: xfs
Le Tue, 15 Oct 2013 20:45:59 +0200 vous écriviez:
> What worries me is that i see 100 % cpu usage, some 74 % memory usage
> (i have 4 gb ram) but there is no disk activity at all. I was
> thinking that it would be at least some reads if the xfs_repair is
> doing something.
What does "iostat -mx 5" output looks like? Is there a lot of IO wait?
Or just no activity at all? Nothing in dmesg output?
--
------------------------------------------------------------------------
Emmanuel Florac | Direction technique
| Intellique
| <eflorac@intellique.com>
| +33 1 78 94 84 02
------------------------------------------------------------------------
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: xfs corrupted
2013-10-15 19:07 ` Chris Murphy
@ 2013-10-15 19:52 ` Emmanuel Florac
0 siblings, 0 replies; 19+ messages in thread
From: Emmanuel Florac @ 2013-10-15 19:52 UTC (permalink / raw)
To: Chris Murphy; +Cc: xfs@oss.sgi.com
Le Tue, 15 Oct 2013 13:07:22 -0600 vous écriviez:
> That is very low RAM for a system with two big arrays attached. So if
> repair finds it needs to repair something it's going to take a long
> time.
> http://xfs.org/index.php/XFS_FAQ#Q:_Which_factors_influence_the_memory_usage_of_xfs_repair.3F
With a recent xfs_repair (3.x) it's large enough. I've checked
similar arrays recently on 4 GB machines in a couple of minutes.
--
------------------------------------------------------------------------
Emmanuel Florac | Direction technique
| Intellique
| <eflorac@intellique.com>
| +33 1 78 94 84 02
------------------------------------------------------------------------
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: xfs corrupted
2013-10-15 19:34 ` Emmanuel Florac
@ 2013-10-15 19:57 ` Stefanita Rares Dumitrescu
2013-10-15 20:05 ` Emmanuel Florac
2013-10-15 20:26 ` Dave Chinner
2013-10-15 20:02 ` Stefanita Rares Dumitrescu
1 sibling, 2 replies; 19+ messages in thread
From: Stefanita Rares Dumitrescu @ 2013-10-15 19:57 UTC (permalink / raw)
To: Emmanuel Florac; +Cc: xfs
Since i am using centos 5.9, the version of the xfsprogs seems to be
old, so i cloned the new one from sgi.
I have a machine with 4 gb ram, and 4 gb swap, and it's all been eaten
up by xfs_repair, and slowed down to a crawl.
the sdc partition is the one being checked. i am all out of memory now.
4 gb phys and 4 gb swap all gone.
http://pastebin.ca/2467064
posted to pastebin for better formatting.
i was using:
[root@kp4 ~]# xfs_repair -o bhash=16384 -o ihash=16384 -o ag_stride=16 \
> /dev/sdc >& /tmp/repair.log
but now i am trying the -m option to see if the memory can be limited,
so the server doesn't freeze.
[root@kp4 ~]# xfs_repair -m 3072 -o ag_stride=16 /dev/sdc >& /tmp/repair.log
nothing in dmesg either.
On 15/10/2013 21:34, Emmanuel Florac wrote:
> Le Tue, 15 Oct 2013 20:45:59 +0200 vous écriviez:
>
>> What worries me is that i see 100 % cpu usage, some 74 % memory usage
>> (i have 4 gb ram) but there is no disk activity at all. I was
>> thinking that it would be at least some reads if the xfs_repair is
>> doing something.
>
> What does "iostat -mx 5" output looks like? Is there a lot of IO wait?
> Or just no activity at all? Nothing in dmesg output?
>
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: xfs corrupted
2013-10-15 19:34 ` Emmanuel Florac
2013-10-15 19:57 ` Stefanita Rares Dumitrescu
@ 2013-10-15 20:02 ` Stefanita Rares Dumitrescu
1 sibling, 0 replies; 19+ messages in thread
From: Stefanita Rares Dumitrescu @ 2013-10-15 20:02 UTC (permalink / raw)
To: Emmanuel Florac; +Cc: xfs
-m maxmem
Specifies the approximate maximum amount of memory, in
megabytes, to use for xfs_repair. xfs_repair has its own internal block
cache which will scale out up to the lesser of the process’s virtual
address limit or about 75% of the system’s physical RAM.
This option overrides these limits.
NOTE: These memory limits are only approximate and may
use more than the specified limit.
I set this at 3 gb limit, but it's at 2.5 gb of swap already used and
still going up :/
On 15/10/2013 21:34, Emmanuel Florac wrote:
> Le Tue, 15 Oct 2013 20:45:59 +0200 vous écriviez:
>
>> What worries me is that i see 100 % cpu usage, some 74 % memory usage
>> (i have 4 gb ram) but there is no disk activity at all. I was
>> thinking that it would be at least some reads if the xfs_repair is
>> doing something.
>
> What does "iostat -mx 5" output looks like? Is there a lot of IO wait?
> Or just no activity at all? Nothing in dmesg output?
>
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: xfs corrupted
2013-10-15 19:57 ` Stefanita Rares Dumitrescu
@ 2013-10-15 20:05 ` Emmanuel Florac
2013-10-15 20:17 ` Stefanita Rares Dumitrescu
2013-10-15 20:18 ` Stefanita Rares Dumitrescu
2013-10-15 20:26 ` Dave Chinner
1 sibling, 2 replies; 19+ messages in thread
From: Emmanuel Florac @ 2013-10-15 20:05 UTC (permalink / raw)
To: Stefanita Rares Dumitrescu; +Cc: xfs
Le Tue, 15 Oct 2013 21:57:47 +0200 vous écriviez:
> I have a machine with 4 gb ram, and 4 gb swap, and it's all been
> eaten up by xfs_repair, and slowed down to a crawl.
>
> the sdc partition is the one being checked. i am all out of memory
> now. 4 gb phys and 4 gb swap all gone.
>
> http://pastebin.ca/2467064
>
> posted to pastebin for better formatting.
>
> i was using:
>
> [root@kp4 ~]# xfs_repair -o bhash=16384 -o ihash=16384 -o
> ag_stride=16 \
> > /dev/sdc >& /tmp/repair.log
>
> but now i am trying the -m option to see if the memory can be
> limited, so the server doesn't freeze.
Or maybe you could turn the swap off entirely for the check. Apparently
all of the IOs are going to sda, which I suppose hosts the swap.
--
------------------------------------------------------------------------
Emmanuel Florac | Direction technique
| Intellique
| <eflorac@intellique.com>
| +33 1 78 94 84 02
------------------------------------------------------------------------
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: xfs corrupted
2013-10-15 20:05 ` Emmanuel Florac
@ 2013-10-15 20:17 ` Stefanita Rares Dumitrescu
2013-10-15 20:18 ` Stefanita Rares Dumitrescu
1 sibling, 0 replies; 19+ messages in thread
From: Stefanita Rares Dumitrescu @ 2013-10-15 20:17 UTC (permalink / raw)
To: Emmanuel Florac; +Cc: xfs
Hmm that never occured to me. I just turned the swap off and i am trying
again.
On 15/10/2013 22:05, Emmanuel Florac wrote:
> Le Tue, 15 Oct 2013 21:57:47 +0200 vous écriviez:
>
>> I have a machine with 4 gb ram, and 4 gb swap, and it's all been
>> eaten up by xfs_repair, and slowed down to a crawl.
>>
>> the sdc partition is the one being checked. i am all out of memory
>> now. 4 gb phys and 4 gb swap all gone.
>>
>> http://pastebin.ca/2467064
>>
>> posted to pastebin for better formatting.
>>
>> i was using:
>>
>> [root@kp4 ~]# xfs_repair -o bhash=16384 -o ihash=16384 -o
>> ag_stride=16 \
>> > /dev/sdc >& /tmp/repair.log
>>
>> but now i am trying the -m option to see if the memory can be
>> limited, so the server doesn't freeze.
>
> Or maybe you could turn the swap off entirely for the check. Apparently
> all of the IOs are going to sda, which I suppose hosts the swap.
>
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: xfs corrupted
2013-10-15 20:05 ` Emmanuel Florac
2013-10-15 20:17 ` Stefanita Rares Dumitrescu
@ 2013-10-15 20:18 ` Stefanita Rares Dumitrescu
1 sibling, 0 replies; 19+ messages in thread
From: Stefanita Rares Dumitrescu @ 2013-10-15 20:18 UTC (permalink / raw)
To: Emmanuel Florac; +Cc: xfs
well that did not work. the machine just freezes.
On 15/10/2013 22:05, Emmanuel Florac wrote:
> Le Tue, 15 Oct 2013 21:57:47 +0200 vous écriviez:
>
>> I have a machine with 4 gb ram, and 4 gb swap, and it's all been
>> eaten up by xfs_repair, and slowed down to a crawl.
>>
>> the sdc partition is the one being checked. i am all out of memory
>> now. 4 gb phys and 4 gb swap all gone.
>>
>> http://pastebin.ca/2467064
>>
>> posted to pastebin for better formatting.
>>
>> i was using:
>>
>> [root@kp4 ~]# xfs_repair -o bhash=16384 -o ihash=16384 -o
>> ag_stride=16 \
>> > /dev/sdc >& /tmp/repair.log
>>
>> but now i am trying the -m option to see if the memory can be
>> limited, so the server doesn't freeze.
>
> Or maybe you could turn the swap off entirely for the check. Apparently
> all of the IOs are going to sda, which I suppose hosts the swap.
>
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: xfs corrupted
2013-10-15 19:57 ` Stefanita Rares Dumitrescu
2013-10-15 20:05 ` Emmanuel Florac
@ 2013-10-15 20:26 ` Dave Chinner
2013-10-16 12:23 ` Stefanita Rares Dumitrescu
` (4 more replies)
1 sibling, 5 replies; 19+ messages in thread
From: Dave Chinner @ 2013-10-15 20:26 UTC (permalink / raw)
To: Stefanita Rares Dumitrescu; +Cc: xfs
On Tue, Oct 15, 2013 at 09:57:47PM +0200, Stefanita Rares Dumitrescu wrote:
> Since i am using centos 5.9, the version of the xfsprogs seems to be
> old, so i cloned the new one from sgi.
>
> I have a machine with 4 gb ram, and 4 gb swap, and it's all been
> eaten up by xfs_repair, and slowed down to a crawl.
>
> the sdc partition is the one being checked. i am all out of memory
> now. 4 gb phys and 4 gb swap all gone.
>
> http://pastebin.ca/2467064
>
> posted to pastebin for better formatting.
>
> i was using:
>
> [root@kp4 ~]# xfs_repair -o bhash=16384 -o ihash=16384 -o ag_stride=16 \
> > /dev/sdc >& /tmp/repair.log
You don't have enough RAM to run threaded prefetching and parallel
AG processing. You'd do better to turn prefetching off entirely with
"-P" if you are having OOM problems.
> but now i am trying the -m option to see if the memory can be
> limited, so the server doesn't freeze.
>
> [root@kp4 ~]# xfs_repair -m 3072 -o ag_stride=16 /dev/sdc >& /tmp/repair.log
>
> nothing in dmesg either.
Give it another 10-20GB of swap, and it should be fine. xfs_repair
usually only thrashes swap when you don't have enough of it and it
keeps trying to free memory, paging in pages that are in swap to
free cached objects from them. Most of the memory references that
repair makes are quite local, so when pages are swapped out they
generally aren't needed again for a while except when cache reclaim
kicks in. Hence if you give it enough swap that it can grow without
bounds, then it should still be quite efficient.
Keep in mind that badly corrupted filesystems require lots more
memory than clean filesystems to check and repair as there is lots
more intermediate state that repair needs to hold in memory about
partially or incompletely referenced objects. Don't be surprised if
the amount of memory needed to repair a badly broken filesystem is
10-100x the amount of RAM needed to run xfs_repair on the same clean
filesystem....
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: xfs corrupted
2013-10-15 20:26 ` Dave Chinner
@ 2013-10-16 12:23 ` Stefanita Rares Dumitrescu
2013-10-16 13:32 ` Stefanita Rares Dumitrescu
` (3 subsequent siblings)
4 siblings, 0 replies; 19+ messages in thread
From: Stefanita Rares Dumitrescu @ 2013-10-16 12:23 UTC (permalink / raw)
To: Dave Chinner; +Cc: xfs
I have a small 40 gb ssd and i tried to do some smart stuff by resizing
the os partition to try to increase the swap, and i botched it, so i
reloaded quickly centos6 today.
I have all the important stuff backed up, so it did not really matter if
i reloaded or not, however:
I did 1a 14 gb swap partition, but on centos6 the xfs_repair doesn't
even go above 2.7 gb, and none of the swap used.
I am using xfsprogs-3.1.1-10.el6_4.1.x86_64
So far so good, i see a lot of reads on the botched array, but just to
be safe i mounted it first, and tested if i could read the data, and it
was all fine.
I will keep you updated. Hopefully i can get over with this.
On 15/10/2013 22:26, Dave Chinner wrote:
> On Tue, Oct 15, 2013 at 09:57:47PM +0200, Stefanita Rares Dumitrescu wrote:
>> Since i am using centos 5.9, the version of the xfsprogs seems to be
>> old, so i cloned the new one from sgi.
>>
>> I have a machine with 4 gb ram, and 4 gb swap, and it's all been
>> eaten up by xfs_repair, and slowed down to a crawl.
>>
>> the sdc partition is the one being checked. i am all out of memory
>> now. 4 gb phys and 4 gb swap all gone.
>>
>> http://pastebin.ca/2467064
>>
>> posted to pastebin for better formatting.
>>
>> i was using:
>>
>> [root@kp4 ~]# xfs_repair -o bhash=16384 -o ihash=16384 -o ag_stride=16 \
>>> /dev/sdc >& /tmp/repair.log
>
> You don't have enough RAM to run threaded prefetching and parallel
> AG processing. You'd do better to turn prefetching off entirely with
> "-P" if you are having OOM problems.
>
>> but now i am trying the -m option to see if the memory can be
>> limited, so the server doesn't freeze.
>>
>> [root@kp4 ~]# xfs_repair -m 3072 -o ag_stride=16 /dev/sdc >& /tmp/repair.log
>>
>> nothing in dmesg either.
>
> Give it another 10-20GB of swap, and it should be fine. xfs_repair
> usually only thrashes swap when you don't have enough of it and it
> keeps trying to free memory, paging in pages that are in swap to
> free cached objects from them. Most of the memory references that
> repair makes are quite local, so when pages are swapped out they
> generally aren't needed again for a while except when cache reclaim
> kicks in. Hence if you give it enough swap that it can grow without
> bounds, then it should still be quite efficient.
>
> Keep in mind that badly corrupted filesystems require lots more
> memory than clean filesystems to check and repair as there is lots
> more intermediate state that repair needs to hold in memory about
> partially or incompletely referenced objects. Don't be surprised if
> the amount of memory needed to repair a badly broken filesystem is
> 10-100x the amount of RAM needed to run xfs_repair on the same clean
> filesystem....
>
> Cheers,
>
> Dave.
>
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: xfs corrupted
2013-10-15 20:26 ` Dave Chinner
2013-10-16 12:23 ` Stefanita Rares Dumitrescu
@ 2013-10-16 13:32 ` Stefanita Rares Dumitrescu
2013-10-16 17:33 ` Keith Keller
2013-10-16 22:16 ` Dave Chinner
2013-10-16 14:32 ` Stefanita Rares Dumitrescu
` (2 subsequent siblings)
4 siblings, 2 replies; 19+ messages in thread
From: Stefanita Rares Dumitrescu @ 2013-10-16 13:32 UTC (permalink / raw)
To: Dave Chinner; +Cc: xfs
Quick update:
The xfsprogs from the centos6 yum are newer and they don't use that much
memory, however i got 2 segfaults and the process stopped.
I cloned the xfsprogs git and i am running it now with the new 15 gb
swap that i created, and this is a monster in memory usage.
Pretty bit of discrepancy.
On 15/10/2013 22:26, Dave Chinner wrote:
> On Tue, Oct 15, 2013 at 09:57:47PM +0200, Stefanita Rares Dumitrescu wrote:
>> Since i am using centos 5.9, the version of the xfsprogs seems to be
>> old, so i cloned the new one from sgi.
>>
>> I have a machine with 4 gb ram, and 4 gb swap, and it's all been
>> eaten up by xfs_repair, and slowed down to a crawl.
>>
>> the sdc partition is the one being checked. i am all out of memory
>> now. 4 gb phys and 4 gb swap all gone.
>>
>> http://pastebin.ca/2467064
>>
>> posted to pastebin for better formatting.
>>
>> i was using:
>>
>> [root@kp4 ~]# xfs_repair -o bhash=16384 -o ihash=16384 -o ag_stride=16 \
>>> /dev/sdc >& /tmp/repair.log
>
> You don't have enough RAM to run threaded prefetching and parallel
> AG processing. You'd do better to turn prefetching off entirely with
> "-P" if you are having OOM problems.
>
>> but now i am trying the -m option to see if the memory can be
>> limited, so the server doesn't freeze.
>>
>> [root@kp4 ~]# xfs_repair -m 3072 -o ag_stride=16 /dev/sdc >& /tmp/repair.log
>>
>> nothing in dmesg either.
>
> Give it another 10-20GB of swap, and it should be fine. xfs_repair
> usually only thrashes swap when you don't have enough of it and it
> keeps trying to free memory, paging in pages that are in swap to
> free cached objects from them. Most of the memory references that
> repair makes are quite local, so when pages are swapped out they
> generally aren't needed again for a while except when cache reclaim
> kicks in. Hence if you give it enough swap that it can grow without
> bounds, then it should still be quite efficient.
>
> Keep in mind that badly corrupted filesystems require lots more
> memory than clean filesystems to check and repair as there is lots
> more intermediate state that repair needs to hold in memory about
> partially or incompletely referenced objects. Don't be surprised if
> the amount of memory needed to repair a badly broken filesystem is
> 10-100x the amount of RAM needed to run xfs_repair on the same clean
> filesystem....
>
> Cheers,
>
> Dave.
>
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: xfs corrupted
2013-10-15 20:26 ` Dave Chinner
2013-10-16 12:23 ` Stefanita Rares Dumitrescu
2013-10-16 13:32 ` Stefanita Rares Dumitrescu
@ 2013-10-16 14:32 ` Stefanita Rares Dumitrescu
2013-10-16 20:52 ` Stefanita Rares Dumitrescu
2013-10-17 18:04 ` Stefanita Rares Dumitrescu
4 siblings, 0 replies; 19+ messages in thread
From: Stefanita Rares Dumitrescu @ 2013-10-16 14:32 UTC (permalink / raw)
To: Dave Chinner; +Cc: xfs
I have been running the xfs_repair from the sgi repo for quite a while
now, and it keeps chugging memory, however there seems to be no progress :/
doubling cache size to 1048576
- 09:27:05: process known inodes and inode discovery - 0 of
76088384 inodes done
- 09:42:05: process known inodes and inode discovery - 0 of
76088384 inodes done
doubling cache size to 2097152
- 09:57:05: process known inodes and inode discovery - 0 of
76088384 inodes done
- 10:12:05: process known inodes and inode discovery - 0 of
76088384 inodes done
- 10:27:05: process known inodes and inode discovery - 0 of
76088384 inodes done
Using the xfsprogs from yum got over this, but it segfaulted.
I am going to give it a little bit more time ... 4 more gb of swap left.
On 15/10/2013 22:26, Dave Chinner wrote:
> On Tue, Oct 15, 2013 at 09:57:47PM +0200, Stefanita Rares Dumitrescu wrote:
>> Since i am using centos 5.9, the version of the xfsprogs seems to be
>> old, so i cloned the new one from sgi.
>>
>> I have a machine with 4 gb ram, and 4 gb swap, and it's all been
>> eaten up by xfs_repair, and slowed down to a crawl.
>>
>> the sdc partition is the one being checked. i am all out of memory
>> now. 4 gb phys and 4 gb swap all gone.
>>
>> http://pastebin.ca/2467064
>>
>> posted to pastebin for better formatting.
>>
>> i was using:
>>
>> [root@kp4 ~]# xfs_repair -o bhash=16384 -o ihash=16384 -o ag_stride=16 \
>>> /dev/sdc >& /tmp/repair.log
>
> You don't have enough RAM to run threaded prefetching and parallel
> AG processing. You'd do better to turn prefetching off entirely with
> "-P" if you are having OOM problems.
>
>> but now i am trying the -m option to see if the memory can be
>> limited, so the server doesn't freeze.
>>
>> [root@kp4 ~]# xfs_repair -m 3072 -o ag_stride=16 /dev/sdc >& /tmp/repair.log
>>
>> nothing in dmesg either.
>
> Give it another 10-20GB of swap, and it should be fine. xfs_repair
> usually only thrashes swap when you don't have enough of it and it
> keeps trying to free memory, paging in pages that are in swap to
> free cached objects from them. Most of the memory references that
> repair makes are quite local, so when pages are swapped out they
> generally aren't needed again for a while except when cache reclaim
> kicks in. Hence if you give it enough swap that it can grow without
> bounds, then it should still be quite efficient.
>
> Keep in mind that badly corrupted filesystems require lots more
> memory than clean filesystems to check and repair as there is lots
> more intermediate state that repair needs to hold in memory about
> partially or incompletely referenced objects. Don't be surprised if
> the amount of memory needed to repair a badly broken filesystem is
> 10-100x the amount of RAM needed to run xfs_repair on the same clean
> filesystem....
>
> Cheers,
>
> Dave.
>
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: xfs corrupted
2013-10-16 13:32 ` Stefanita Rares Dumitrescu
@ 2013-10-16 17:33 ` Keith Keller
2013-10-16 22:16 ` Dave Chinner
1 sibling, 0 replies; 19+ messages in thread
From: Keith Keller @ 2013-10-16 17:33 UTC (permalink / raw)
To: linux-xfs
On 2013-10-16, Stefanita Rares Dumitrescu <katmai@keptprivate.com> wrote:
>
> The xfsprogs from the centos6 yum are newer
They are certainly newer than from CentOS 5, but are still reasonably
old compared to git. You should probably prefer the latest stable
version or the git version over what's available from yum by default.
> I cloned the xfsprogs git and i am running it now with the new 15 gb
> swap that i created, and this is a monster in memory usage.
>
> Pretty bit of discrepancy.
As others have suggested, lots of memory use is to be expected with the
size of the filesystem and the amount of memory you have. Did you use
the -P switch as Dave suggested? I have found it very helpful in
low-memory situations.
--keith
--
kkeller@wombat.san-francisco.ca.us
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: xfs corrupted
2013-10-15 20:26 ` Dave Chinner
` (2 preceding siblings ...)
2013-10-16 14:32 ` Stefanita Rares Dumitrescu
@ 2013-10-16 20:52 ` Stefanita Rares Dumitrescu
2013-10-17 18:04 ` Stefanita Rares Dumitrescu
4 siblings, 0 replies; 19+ messages in thread
From: Stefanita Rares Dumitrescu @ 2013-10-16 20:52 UTC (permalink / raw)
To: Dave Chinner; +Cc: xfs
another quick update:
after reloading centos 6, i noticed that both arrays were in verifying
status, so i stopped xfs_repair to see if the raid array has some
inconsistencies, which it did, and repaired. so my note here is that
even if the arrays show okay, you should force verify after a power outage.
now the array verify has completed, and some errors were fixed, so i am
running xfs_repair once more on the broken array.
to keep note, i can now write on the array without issues, lag or whatever.
On 15/10/2013 22:26, Dave Chinner wrote:
> On Tue, Oct 15, 2013 at 09:57:47PM +0200, Stefanita Rares Dumitrescu wrote:
>> Since i am using centos 5.9, the version of the xfsprogs seems to be
>> old, so i cloned the new one from sgi.
>>
>> I have a machine with 4 gb ram, and 4 gb swap, and it's all been
>> eaten up by xfs_repair, and slowed down to a crawl.
>>
>> the sdc partition is the one being checked. i am all out of memory
>> now. 4 gb phys and 4 gb swap all gone.
>>
>> http://pastebin.ca/2467064
>>
>> posted to pastebin for better formatting.
>>
>> i was using:
>>
>> [root@kp4 ~]# xfs_repair -o bhash=16384 -o ihash=16384 -o ag_stride=16 \
>>> /dev/sdc >& /tmp/repair.log
>
> You don't have enough RAM to run threaded prefetching and parallel
> AG processing. You'd do better to turn prefetching off entirely with
> "-P" if you are having OOM problems.
>
>> but now i am trying the -m option to see if the memory can be
>> limited, so the server doesn't freeze.
>>
>> [root@kp4 ~]# xfs_repair -m 3072 -o ag_stride=16 /dev/sdc >& /tmp/repair.log
>>
>> nothing in dmesg either.
>
> Give it another 10-20GB of swap, and it should be fine. xfs_repair
> usually only thrashes swap when you don't have enough of it and it
> keeps trying to free memory, paging in pages that are in swap to
> free cached objects from them. Most of the memory references that
> repair makes are quite local, so when pages are swapped out they
> generally aren't needed again for a while except when cache reclaim
> kicks in. Hence if you give it enough swap that it can grow without
> bounds, then it should still be quite efficient.
>
> Keep in mind that badly corrupted filesystems require lots more
> memory than clean filesystems to check and repair as there is lots
> more intermediate state that repair needs to hold in memory about
> partially or incompletely referenced objects. Don't be surprised if
> the amount of memory needed to repair a badly broken filesystem is
> 10-100x the amount of RAM needed to run xfs_repair on the same clean
> filesystem....
>
> Cheers,
>
> Dave.
>
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: xfs corrupted
2013-10-16 13:32 ` Stefanita Rares Dumitrescu
2013-10-16 17:33 ` Keith Keller
@ 2013-10-16 22:16 ` Dave Chinner
1 sibling, 0 replies; 19+ messages in thread
From: Dave Chinner @ 2013-10-16 22:16 UTC (permalink / raw)
To: Stefanita Rares Dumitrescu; +Cc: xfs
On Wed, Oct 16, 2013 at 03:32:00PM +0200, Stefanita Rares Dumitrescu wrote:
> Quick update:
>
> The xfsprogs from the centos6 yum are newer and they don't use that
> much memory, however i got 2 segfaults and the process stopped.
>
> I cloned the xfsprogs git and i am running it now with the new 15 gb
> swap that i created, and this is a monster in memory usage.
>
> Pretty bit of discrepancy.
Not if the centos 6 version is segfaulting before it gets to the
stage that consumes all the memory. From your subsequent post, you
have 76 million inodes in the filesystem. If xfs_repair has to track
all those inodes as part of the recovery (e.g. you lost the root
directory), then it has to index them all in memory.
Most people have no idea how much disk space this amount of metadata
consumes and hence why xfs_repair might run out of memory. For
example, an newly created 100TB filesystem with 50 million zero
length files in it consumes 28GB of space in metadata.
You've got 50% more inodes than that, so you've xfs_repair is
probably walking in excess of 40GB of metadata in your filesystem.
If a significant portion of that metadata is corrupt, then repair
needs to hold both the suspicious metadata and a cross reference
index in memory to be able to rebuild it all. Hence when you have
etns of gigabytes of metadata, xfs_repair can need tens of GB of RAM
to be able to repair it. There's simply no easy way around this.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: xfs corrupted
2013-10-15 20:26 ` Dave Chinner
` (3 preceding siblings ...)
2013-10-16 20:52 ` Stefanita Rares Dumitrescu
@ 2013-10-17 18:04 ` Stefanita Rares Dumitrescu
4 siblings, 0 replies; 19+ messages in thread
From: Stefanita Rares Dumitrescu @ 2013-10-17 18:04 UTC (permalink / raw)
To: Dave Chinner; +Cc: xfs
Hi guys,
I am finished yay! After the array got rechecked and fixed some errors,
i ran the xfs_repair and left it overnight, and i came back to a clean
system.
Thanks for all your support. Now i learned that no matter what the raid
card status says, i still need to force another integrity check after a
power failure, even if it says all is good.
On 15/10/2013 22:26, Dave Chinner wrote:
> On Tue, Oct 15, 2013 at 09:57:47PM +0200, Stefanita Rares Dumitrescu wrote:
>> Since i am using centos 5.9, the version of the xfsprogs seems to be
>> old, so i cloned the new one from sgi.
>>
>> I have a machine with 4 gb ram, and 4 gb swap, and it's all been
>> eaten up by xfs_repair, and slowed down to a crawl.
>>
>> the sdc partition is the one being checked. i am all out of memory
>> now. 4 gb phys and 4 gb swap all gone.
>>
>> http://pastebin.ca/2467064
>>
>> posted to pastebin for better formatting.
>>
>> i was using:
>>
>> [root@kp4 ~]# xfs_repair -o bhash=16384 -o ihash=16384 -o ag_stride=16 \
>>> /dev/sdc >& /tmp/repair.log
>
> You don't have enough RAM to run threaded prefetching and parallel
> AG processing. You'd do better to turn prefetching off entirely with
> "-P" if you are having OOM problems.
>
>> but now i am trying the -m option to see if the memory can be
>> limited, so the server doesn't freeze.
>>
>> [root@kp4 ~]# xfs_repair -m 3072 -o ag_stride=16 /dev/sdc >& /tmp/repair.log
>>
>> nothing in dmesg either.
>
> Give it another 10-20GB of swap, and it should be fine. xfs_repair
> usually only thrashes swap when you don't have enough of it and it
> keeps trying to free memory, paging in pages that are in swap to
> free cached objects from them. Most of the memory references that
> repair makes are quite local, so when pages are swapped out they
> generally aren't needed again for a while except when cache reclaim
> kicks in. Hence if you give it enough swap that it can grow without
> bounds, then it should still be quite efficient.
>
> Keep in mind that badly corrupted filesystems require lots more
> memory than clean filesystems to check and repair as there is lots
> more intermediate state that repair needs to hold in memory about
> partially or incompletely referenced objects. Don't be surprised if
> the amount of memory needed to repair a badly broken filesystem is
> 10-100x the amount of RAM needed to run xfs_repair on the same clean
> filesystem....
>
> Cheers,
>
> Dave.
>
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2013-10-17 18:04 UTC | newest]
Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-10-15 8:41 xfs corrupted katmai
2013-10-15 18:34 ` Emmanuel Florac
2013-10-15 18:45 ` Stefanita Rares Dumitrescu
2013-10-15 19:07 ` Chris Murphy
2013-10-15 19:52 ` Emmanuel Florac
2013-10-15 19:34 ` Emmanuel Florac
2013-10-15 19:57 ` Stefanita Rares Dumitrescu
2013-10-15 20:05 ` Emmanuel Florac
2013-10-15 20:17 ` Stefanita Rares Dumitrescu
2013-10-15 20:18 ` Stefanita Rares Dumitrescu
2013-10-15 20:26 ` Dave Chinner
2013-10-16 12:23 ` Stefanita Rares Dumitrescu
2013-10-16 13:32 ` Stefanita Rares Dumitrescu
2013-10-16 17:33 ` Keith Keller
2013-10-16 22:16 ` Dave Chinner
2013-10-16 14:32 ` Stefanita Rares Dumitrescu
2013-10-16 20:52 ` Stefanita Rares Dumitrescu
2013-10-17 18:04 ` Stefanita Rares Dumitrescu
2013-10-15 20:02 ` Stefanita Rares Dumitrescu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox