* xfs corrupted @ 2013-10-15 8:41 katmai 2013-10-15 18:34 ` Emmanuel Florac 0 siblings, 1 reply; 19+ messages in thread From: katmai @ 2013-10-15 8:41 UTC (permalink / raw) To: xfs i guys, i have a problem. yesterday there was a power outage at one of my datacenters, where i have a relatively large fileserver. 2 arrays, 1 x 14 tb and 1 x 18 tb both in raid6, with an adaptec card. after the outage, the server came back online, the xfs partitions were mounted, and everything looked okay. i could access the data and everything seemed just fine. today i woke up to lots of i/o errors, and when i rebooted the server, the partitions would not mount: Oct 14 04:09:17 kp4 kernel: Oct 14 04:09:17 kp4 kernel: XFS internal error XFS_WANT_CORRUPTED_RETURN a<ffffffff80056933>] pdflush+0x0/0x1fb Oct 14 04:09:17 kp4 kernel: [<ffffffff80056a84>] pdflush+0x151/0x1fb Oct 14 04:09:17 kp4 kernel: [<ffffffff800cd931>] wb_kupdate+0x0/0x16a Oct 14 04:09:17 kp4 kernel: [<ffffffff80032c2b>] kthread+0xfe/0x132 Oct 14 04:09:17 kp4 kernel: [<ffffffff8005dfc1>] child_rip+0xa/0x11 Oct 14 04:09:17 kp4 kernel: [<ffffffff800a3ab7>] keventd_create_kthread+0x0/0xc4 Oct 14 04:09:17 kp4 kernel: [<ffffffff80032b2d>] kthread+0x0/0x132 Oct 14 04:09:17 kp4 kernel: [<ffffffff8005dfb7>] child_rip+0x0/0x11 Oct 14 04:09:17 kp4 kernel: Oct 14 04:09:17 kp4 kernel: XFS internal error XFS_WANT_CORRUPTED_RETURN at line 279 of file fs/xfs/xfs_alloc.c. Caller 0xffffffff88342331 Oct 14 04:09:17 kp4 kernel: got a bunch of these in dmesg. i googled for solutions and i think i jumped the horse by doing xfs_repair -L /dev/sdc. it would not clean it with xfs_repair /dev/sdc, and everybody pretty much says the same thing. this is what i was getting when trying to mount the array. Filesystem Corruption of in-memory data detected. Shutting down filesystem xfs_check Did i jump the gun by using the -L switch :/ ? -- View this message in context: http://xfs.9218.n7.nabble.com/xfs-corrupted-tp35009.html Sent from the Xfs - General mailing list archive at Nabble.com. _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: xfs corrupted 2013-10-15 8:41 xfs corrupted katmai @ 2013-10-15 18:34 ` Emmanuel Florac 2013-10-15 18:45 ` Stefanita Rares Dumitrescu 0 siblings, 1 reply; 19+ messages in thread From: Emmanuel Florac @ 2013-10-15 18:34 UTC (permalink / raw) To: katmai; +Cc: xfs Le Tue, 15 Oct 2013 01:41:47 -0700 (PDT) vous écriviez: > Did i jump the gun by using the -L switch :/ ? You should have checked that the RAID is optimal first! In case of a flailing hardware, any write to the volume can exacerbate problems. You should use arcconf to check for the RAID state (arcconf getstatus 1) and eventually run a RAID repair (arcconf task start 1 logicaldrive 0 verify_fix). -- ------------------------------------------------------------------------ Emmanuel Florac | Direction technique | Intellique | <eflorac@intellique.com> | +33 1 78 94 84 02 ------------------------------------------------------------------------ _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: xfs corrupted 2013-10-15 18:34 ` Emmanuel Florac @ 2013-10-15 18:45 ` Stefanita Rares Dumitrescu 2013-10-15 19:07 ` Chris Murphy 2013-10-15 19:34 ` Emmanuel Florac 0 siblings, 2 replies; 19+ messages in thread From: Stefanita Rares Dumitrescu @ 2013-10-15 18:45 UTC (permalink / raw) To: Emmanuel Florac; +Cc: xfs That was the first thing i checked: the array was optimal, and i checked each drive with smartctl, and they are all fine. I left the xfs_repair on for the night, and it showed no progress. I was actually thinking that maybe the memory is bad, so i took the server offline this morning, and ran a memtest for 3 hours, which showed nothing wrong with the sticks, however good news: I was able to mount the array, but i can only read from it. Whenever i try to write something, it just hangs right there. I ran an xfs_repair -n on the second array, which is 18 tb in size as opposed to the 14 tb first one, and that check completed in like 10 minutes. I am running now xfs_repair -n on the 14 tb bad array, and it's stuck here for about 5 hours now. [root@kp4 ~]# umount /home [root@kp4 ~]# xfs_repair -n /dev/sdc Phase 1 - find and verify superblock... Phase 2 - using internal log - scan filesystem freespace and inode maps... - found root inode chunk Phase 3 - for each AG... - scan (but don't clear) agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 What worries me is that i see 100 % cpu usage, some 74 % memory usage (i have 4 gb ram) but there is no disk activity at all. I was thinking that it would be at least some reads if the xfs_repair is doing something. On 15/10/2013 20:34, Emmanuel Florac wrote: > Le Tue, 15 Oct 2013 01:41:47 -0700 (PDT) vous écriviez: > >> Did i jump the gun by using the -L switch :/ ? > > You should have checked that the RAID is optimal first! In case of a > flailing hardware, any write to the volume can exacerbate problems. > > You should use arcconf to check for the RAID state (arcconf getstatus > 1) and eventually run a RAID repair (arcconf task start 1 logicaldrive > 0 verify_fix). > _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: xfs corrupted 2013-10-15 18:45 ` Stefanita Rares Dumitrescu @ 2013-10-15 19:07 ` Chris Murphy 2013-10-15 19:52 ` Emmanuel Florac 2013-10-15 19:34 ` Emmanuel Florac 1 sibling, 1 reply; 19+ messages in thread From: Chris Murphy @ 2013-10-15 19:07 UTC (permalink / raw) To: xfs@oss.sgi.com On Oct 15, 2013, at 12:45 PM, Stefanita Rares Dumitrescu <katmai@keptprivate.com> wrote: > > What worries me is that i see 100 % cpu usage, some 74 % memory usage (i have 4 gb ram) but there is no disk activity at all. I was thinking that it would be at least some reads if the xfs_repair is doing something. That is very low RAM for a system with two big arrays attached. So if repair finds it needs to repair something it's going to take a long time. http://xfs.org/index.php/XFS_FAQ#Q:_Which_factors_influence_the_memory_usage_of_xfs_repair.3F Chris Murphy _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: xfs corrupted 2013-10-15 19:07 ` Chris Murphy @ 2013-10-15 19:52 ` Emmanuel Florac 0 siblings, 0 replies; 19+ messages in thread From: Emmanuel Florac @ 2013-10-15 19:52 UTC (permalink / raw) To: Chris Murphy; +Cc: xfs@oss.sgi.com Le Tue, 15 Oct 2013 13:07:22 -0600 vous écriviez: > That is very low RAM for a system with two big arrays attached. So if > repair finds it needs to repair something it's going to take a long > time. > http://xfs.org/index.php/XFS_FAQ#Q:_Which_factors_influence_the_memory_usage_of_xfs_repair.3F With a recent xfs_repair (3.x) it's large enough. I've checked similar arrays recently on 4 GB machines in a couple of minutes. -- ------------------------------------------------------------------------ Emmanuel Florac | Direction technique | Intellique | <eflorac@intellique.com> | +33 1 78 94 84 02 ------------------------------------------------------------------------ _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: xfs corrupted 2013-10-15 18:45 ` Stefanita Rares Dumitrescu 2013-10-15 19:07 ` Chris Murphy @ 2013-10-15 19:34 ` Emmanuel Florac 2013-10-15 19:57 ` Stefanita Rares Dumitrescu 2013-10-15 20:02 ` Stefanita Rares Dumitrescu 1 sibling, 2 replies; 19+ messages in thread From: Emmanuel Florac @ 2013-10-15 19:34 UTC (permalink / raw) To: Stefanita Rares Dumitrescu; +Cc: xfs Le Tue, 15 Oct 2013 20:45:59 +0200 vous écriviez: > What worries me is that i see 100 % cpu usage, some 74 % memory usage > (i have 4 gb ram) but there is no disk activity at all. I was > thinking that it would be at least some reads if the xfs_repair is > doing something. What does "iostat -mx 5" output looks like? Is there a lot of IO wait? Or just no activity at all? Nothing in dmesg output? -- ------------------------------------------------------------------------ Emmanuel Florac | Direction technique | Intellique | <eflorac@intellique.com> | +33 1 78 94 84 02 ------------------------------------------------------------------------ _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: xfs corrupted 2013-10-15 19:34 ` Emmanuel Florac @ 2013-10-15 19:57 ` Stefanita Rares Dumitrescu 2013-10-15 20:05 ` Emmanuel Florac 2013-10-15 20:26 ` Dave Chinner 2013-10-15 20:02 ` Stefanita Rares Dumitrescu 1 sibling, 2 replies; 19+ messages in thread From: Stefanita Rares Dumitrescu @ 2013-10-15 19:57 UTC (permalink / raw) To: Emmanuel Florac; +Cc: xfs Since i am using centos 5.9, the version of the xfsprogs seems to be old, so i cloned the new one from sgi. I have a machine with 4 gb ram, and 4 gb swap, and it's all been eaten up by xfs_repair, and slowed down to a crawl. the sdc partition is the one being checked. i am all out of memory now. 4 gb phys and 4 gb swap all gone. http://pastebin.ca/2467064 posted to pastebin for better formatting. i was using: [root@kp4 ~]# xfs_repair -o bhash=16384 -o ihash=16384 -o ag_stride=16 \ > /dev/sdc >& /tmp/repair.log but now i am trying the -m option to see if the memory can be limited, so the server doesn't freeze. [root@kp4 ~]# xfs_repair -m 3072 -o ag_stride=16 /dev/sdc >& /tmp/repair.log nothing in dmesg either. On 15/10/2013 21:34, Emmanuel Florac wrote: > Le Tue, 15 Oct 2013 20:45:59 +0200 vous écriviez: > >> What worries me is that i see 100 % cpu usage, some 74 % memory usage >> (i have 4 gb ram) but there is no disk activity at all. I was >> thinking that it would be at least some reads if the xfs_repair is >> doing something. > > What does "iostat -mx 5" output looks like? Is there a lot of IO wait? > Or just no activity at all? Nothing in dmesg output? > _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: xfs corrupted 2013-10-15 19:57 ` Stefanita Rares Dumitrescu @ 2013-10-15 20:05 ` Emmanuel Florac 2013-10-15 20:17 ` Stefanita Rares Dumitrescu 2013-10-15 20:18 ` Stefanita Rares Dumitrescu 2013-10-15 20:26 ` Dave Chinner 1 sibling, 2 replies; 19+ messages in thread From: Emmanuel Florac @ 2013-10-15 20:05 UTC (permalink / raw) To: Stefanita Rares Dumitrescu; +Cc: xfs Le Tue, 15 Oct 2013 21:57:47 +0200 vous écriviez: > I have a machine with 4 gb ram, and 4 gb swap, and it's all been > eaten up by xfs_repair, and slowed down to a crawl. > > the sdc partition is the one being checked. i am all out of memory > now. 4 gb phys and 4 gb swap all gone. > > http://pastebin.ca/2467064 > > posted to pastebin for better formatting. > > i was using: > > [root@kp4 ~]# xfs_repair -o bhash=16384 -o ihash=16384 -o > ag_stride=16 \ > > /dev/sdc >& /tmp/repair.log > > but now i am trying the -m option to see if the memory can be > limited, so the server doesn't freeze. Or maybe you could turn the swap off entirely for the check. Apparently all of the IOs are going to sda, which I suppose hosts the swap. -- ------------------------------------------------------------------------ Emmanuel Florac | Direction technique | Intellique | <eflorac@intellique.com> | +33 1 78 94 84 02 ------------------------------------------------------------------------ _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: xfs corrupted 2013-10-15 20:05 ` Emmanuel Florac @ 2013-10-15 20:17 ` Stefanita Rares Dumitrescu 2013-10-15 20:18 ` Stefanita Rares Dumitrescu 1 sibling, 0 replies; 19+ messages in thread From: Stefanita Rares Dumitrescu @ 2013-10-15 20:17 UTC (permalink / raw) To: Emmanuel Florac; +Cc: xfs Hmm that never occured to me. I just turned the swap off and i am trying again. On 15/10/2013 22:05, Emmanuel Florac wrote: > Le Tue, 15 Oct 2013 21:57:47 +0200 vous écriviez: > >> I have a machine with 4 gb ram, and 4 gb swap, and it's all been >> eaten up by xfs_repair, and slowed down to a crawl. >> >> the sdc partition is the one being checked. i am all out of memory >> now. 4 gb phys and 4 gb swap all gone. >> >> http://pastebin.ca/2467064 >> >> posted to pastebin for better formatting. >> >> i was using: >> >> [root@kp4 ~]# xfs_repair -o bhash=16384 -o ihash=16384 -o >> ag_stride=16 \ >> > /dev/sdc >& /tmp/repair.log >> >> but now i am trying the -m option to see if the memory can be >> limited, so the server doesn't freeze. > > Or maybe you could turn the swap off entirely for the check. Apparently > all of the IOs are going to sda, which I suppose hosts the swap. > _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: xfs corrupted 2013-10-15 20:05 ` Emmanuel Florac 2013-10-15 20:17 ` Stefanita Rares Dumitrescu @ 2013-10-15 20:18 ` Stefanita Rares Dumitrescu 1 sibling, 0 replies; 19+ messages in thread From: Stefanita Rares Dumitrescu @ 2013-10-15 20:18 UTC (permalink / raw) To: Emmanuel Florac; +Cc: xfs well that did not work. the machine just freezes. On 15/10/2013 22:05, Emmanuel Florac wrote: > Le Tue, 15 Oct 2013 21:57:47 +0200 vous écriviez: > >> I have a machine with 4 gb ram, and 4 gb swap, and it's all been >> eaten up by xfs_repair, and slowed down to a crawl. >> >> the sdc partition is the one being checked. i am all out of memory >> now. 4 gb phys and 4 gb swap all gone. >> >> http://pastebin.ca/2467064 >> >> posted to pastebin for better formatting. >> >> i was using: >> >> [root@kp4 ~]# xfs_repair -o bhash=16384 -o ihash=16384 -o >> ag_stride=16 \ >> > /dev/sdc >& /tmp/repair.log >> >> but now i am trying the -m option to see if the memory can be >> limited, so the server doesn't freeze. > > Or maybe you could turn the swap off entirely for the check. Apparently > all of the IOs are going to sda, which I suppose hosts the swap. > _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: xfs corrupted 2013-10-15 19:57 ` Stefanita Rares Dumitrescu 2013-10-15 20:05 ` Emmanuel Florac @ 2013-10-15 20:26 ` Dave Chinner 2013-10-16 12:23 ` Stefanita Rares Dumitrescu ` (4 more replies) 1 sibling, 5 replies; 19+ messages in thread From: Dave Chinner @ 2013-10-15 20:26 UTC (permalink / raw) To: Stefanita Rares Dumitrescu; +Cc: xfs On Tue, Oct 15, 2013 at 09:57:47PM +0200, Stefanita Rares Dumitrescu wrote: > Since i am using centos 5.9, the version of the xfsprogs seems to be > old, so i cloned the new one from sgi. > > I have a machine with 4 gb ram, and 4 gb swap, and it's all been > eaten up by xfs_repair, and slowed down to a crawl. > > the sdc partition is the one being checked. i am all out of memory > now. 4 gb phys and 4 gb swap all gone. > > http://pastebin.ca/2467064 > > posted to pastebin for better formatting. > > i was using: > > [root@kp4 ~]# xfs_repair -o bhash=16384 -o ihash=16384 -o ag_stride=16 \ > > /dev/sdc >& /tmp/repair.log You don't have enough RAM to run threaded prefetching and parallel AG processing. You'd do better to turn prefetching off entirely with "-P" if you are having OOM problems. > but now i am trying the -m option to see if the memory can be > limited, so the server doesn't freeze. > > [root@kp4 ~]# xfs_repair -m 3072 -o ag_stride=16 /dev/sdc >& /tmp/repair.log > > nothing in dmesg either. Give it another 10-20GB of swap, and it should be fine. xfs_repair usually only thrashes swap when you don't have enough of it and it keeps trying to free memory, paging in pages that are in swap to free cached objects from them. Most of the memory references that repair makes are quite local, so when pages are swapped out they generally aren't needed again for a while except when cache reclaim kicks in. Hence if you give it enough swap that it can grow without bounds, then it should still be quite efficient. Keep in mind that badly corrupted filesystems require lots more memory than clean filesystems to check and repair as there is lots more intermediate state that repair needs to hold in memory about partially or incompletely referenced objects. Don't be surprised if the amount of memory needed to repair a badly broken filesystem is 10-100x the amount of RAM needed to run xfs_repair on the same clean filesystem.... Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: xfs corrupted 2013-10-15 20:26 ` Dave Chinner @ 2013-10-16 12:23 ` Stefanita Rares Dumitrescu 2013-10-16 13:32 ` Stefanita Rares Dumitrescu ` (3 subsequent siblings) 4 siblings, 0 replies; 19+ messages in thread From: Stefanita Rares Dumitrescu @ 2013-10-16 12:23 UTC (permalink / raw) To: Dave Chinner; +Cc: xfs I have a small 40 gb ssd and i tried to do some smart stuff by resizing the os partition to try to increase the swap, and i botched it, so i reloaded quickly centos6 today. I have all the important stuff backed up, so it did not really matter if i reloaded or not, however: I did 1a 14 gb swap partition, but on centos6 the xfs_repair doesn't even go above 2.7 gb, and none of the swap used. I am using xfsprogs-3.1.1-10.el6_4.1.x86_64 So far so good, i see a lot of reads on the botched array, but just to be safe i mounted it first, and tested if i could read the data, and it was all fine. I will keep you updated. Hopefully i can get over with this. On 15/10/2013 22:26, Dave Chinner wrote: > On Tue, Oct 15, 2013 at 09:57:47PM +0200, Stefanita Rares Dumitrescu wrote: >> Since i am using centos 5.9, the version of the xfsprogs seems to be >> old, so i cloned the new one from sgi. >> >> I have a machine with 4 gb ram, and 4 gb swap, and it's all been >> eaten up by xfs_repair, and slowed down to a crawl. >> >> the sdc partition is the one being checked. i am all out of memory >> now. 4 gb phys and 4 gb swap all gone. >> >> http://pastebin.ca/2467064 >> >> posted to pastebin for better formatting. >> >> i was using: >> >> [root@kp4 ~]# xfs_repair -o bhash=16384 -o ihash=16384 -o ag_stride=16 \ >>> /dev/sdc >& /tmp/repair.log > > You don't have enough RAM to run threaded prefetching and parallel > AG processing. You'd do better to turn prefetching off entirely with > "-P" if you are having OOM problems. > >> but now i am trying the -m option to see if the memory can be >> limited, so the server doesn't freeze. >> >> [root@kp4 ~]# xfs_repair -m 3072 -o ag_stride=16 /dev/sdc >& /tmp/repair.log >> >> nothing in dmesg either. > > Give it another 10-20GB of swap, and it should be fine. xfs_repair > usually only thrashes swap when you don't have enough of it and it > keeps trying to free memory, paging in pages that are in swap to > free cached objects from them. Most of the memory references that > repair makes are quite local, so when pages are swapped out they > generally aren't needed again for a while except when cache reclaim > kicks in. Hence if you give it enough swap that it can grow without > bounds, then it should still be quite efficient. > > Keep in mind that badly corrupted filesystems require lots more > memory than clean filesystems to check and repair as there is lots > more intermediate state that repair needs to hold in memory about > partially or incompletely referenced objects. Don't be surprised if > the amount of memory needed to repair a badly broken filesystem is > 10-100x the amount of RAM needed to run xfs_repair on the same clean > filesystem.... > > Cheers, > > Dave. > _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: xfs corrupted 2013-10-15 20:26 ` Dave Chinner 2013-10-16 12:23 ` Stefanita Rares Dumitrescu @ 2013-10-16 13:32 ` Stefanita Rares Dumitrescu 2013-10-16 17:33 ` Keith Keller 2013-10-16 22:16 ` Dave Chinner 2013-10-16 14:32 ` Stefanita Rares Dumitrescu ` (2 subsequent siblings) 4 siblings, 2 replies; 19+ messages in thread From: Stefanita Rares Dumitrescu @ 2013-10-16 13:32 UTC (permalink / raw) To: Dave Chinner; +Cc: xfs Quick update: The xfsprogs from the centos6 yum are newer and they don't use that much memory, however i got 2 segfaults and the process stopped. I cloned the xfsprogs git and i am running it now with the new 15 gb swap that i created, and this is a monster in memory usage. Pretty bit of discrepancy. On 15/10/2013 22:26, Dave Chinner wrote: > On Tue, Oct 15, 2013 at 09:57:47PM +0200, Stefanita Rares Dumitrescu wrote: >> Since i am using centos 5.9, the version of the xfsprogs seems to be >> old, so i cloned the new one from sgi. >> >> I have a machine with 4 gb ram, and 4 gb swap, and it's all been >> eaten up by xfs_repair, and slowed down to a crawl. >> >> the sdc partition is the one being checked. i am all out of memory >> now. 4 gb phys and 4 gb swap all gone. >> >> http://pastebin.ca/2467064 >> >> posted to pastebin for better formatting. >> >> i was using: >> >> [root@kp4 ~]# xfs_repair -o bhash=16384 -o ihash=16384 -o ag_stride=16 \ >>> /dev/sdc >& /tmp/repair.log > > You don't have enough RAM to run threaded prefetching and parallel > AG processing. You'd do better to turn prefetching off entirely with > "-P" if you are having OOM problems. > >> but now i am trying the -m option to see if the memory can be >> limited, so the server doesn't freeze. >> >> [root@kp4 ~]# xfs_repair -m 3072 -o ag_stride=16 /dev/sdc >& /tmp/repair.log >> >> nothing in dmesg either. > > Give it another 10-20GB of swap, and it should be fine. xfs_repair > usually only thrashes swap when you don't have enough of it and it > keeps trying to free memory, paging in pages that are in swap to > free cached objects from them. Most of the memory references that > repair makes are quite local, so when pages are swapped out they > generally aren't needed again for a while except when cache reclaim > kicks in. Hence if you give it enough swap that it can grow without > bounds, then it should still be quite efficient. > > Keep in mind that badly corrupted filesystems require lots more > memory than clean filesystems to check and repair as there is lots > more intermediate state that repair needs to hold in memory about > partially or incompletely referenced objects. Don't be surprised if > the amount of memory needed to repair a badly broken filesystem is > 10-100x the amount of RAM needed to run xfs_repair on the same clean > filesystem.... > > Cheers, > > Dave. > _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: xfs corrupted 2013-10-16 13:32 ` Stefanita Rares Dumitrescu @ 2013-10-16 17:33 ` Keith Keller 2013-10-16 22:16 ` Dave Chinner 1 sibling, 0 replies; 19+ messages in thread From: Keith Keller @ 2013-10-16 17:33 UTC (permalink / raw) To: linux-xfs On 2013-10-16, Stefanita Rares Dumitrescu <katmai@keptprivate.com> wrote: > > The xfsprogs from the centos6 yum are newer They are certainly newer than from CentOS 5, but are still reasonably old compared to git. You should probably prefer the latest stable version or the git version over what's available from yum by default. > I cloned the xfsprogs git and i am running it now with the new 15 gb > swap that i created, and this is a monster in memory usage. > > Pretty bit of discrepancy. As others have suggested, lots of memory use is to be expected with the size of the filesystem and the amount of memory you have. Did you use the -P switch as Dave suggested? I have found it very helpful in low-memory situations. --keith -- kkeller@wombat.san-francisco.ca.us _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: xfs corrupted 2013-10-16 13:32 ` Stefanita Rares Dumitrescu 2013-10-16 17:33 ` Keith Keller @ 2013-10-16 22:16 ` Dave Chinner 1 sibling, 0 replies; 19+ messages in thread From: Dave Chinner @ 2013-10-16 22:16 UTC (permalink / raw) To: Stefanita Rares Dumitrescu; +Cc: xfs On Wed, Oct 16, 2013 at 03:32:00PM +0200, Stefanita Rares Dumitrescu wrote: > Quick update: > > The xfsprogs from the centos6 yum are newer and they don't use that > much memory, however i got 2 segfaults and the process stopped. > > I cloned the xfsprogs git and i am running it now with the new 15 gb > swap that i created, and this is a monster in memory usage. > > Pretty bit of discrepancy. Not if the centos 6 version is segfaulting before it gets to the stage that consumes all the memory. From your subsequent post, you have 76 million inodes in the filesystem. If xfs_repair has to track all those inodes as part of the recovery (e.g. you lost the root directory), then it has to index them all in memory. Most people have no idea how much disk space this amount of metadata consumes and hence why xfs_repair might run out of memory. For example, an newly created 100TB filesystem with 50 million zero length files in it consumes 28GB of space in metadata. You've got 50% more inodes than that, so you've xfs_repair is probably walking in excess of 40GB of metadata in your filesystem. If a significant portion of that metadata is corrupt, then repair needs to hold both the suspicious metadata and a cross reference index in memory to be able to rebuild it all. Hence when you have etns of gigabytes of metadata, xfs_repair can need tens of GB of RAM to be able to repair it. There's simply no easy way around this. Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: xfs corrupted 2013-10-15 20:26 ` Dave Chinner 2013-10-16 12:23 ` Stefanita Rares Dumitrescu 2013-10-16 13:32 ` Stefanita Rares Dumitrescu @ 2013-10-16 14:32 ` Stefanita Rares Dumitrescu 2013-10-16 20:52 ` Stefanita Rares Dumitrescu 2013-10-17 18:04 ` Stefanita Rares Dumitrescu 4 siblings, 0 replies; 19+ messages in thread From: Stefanita Rares Dumitrescu @ 2013-10-16 14:32 UTC (permalink / raw) To: Dave Chinner; +Cc: xfs I have been running the xfs_repair from the sgi repo for quite a while now, and it keeps chugging memory, however there seems to be no progress :/ doubling cache size to 1048576 - 09:27:05: process known inodes and inode discovery - 0 of 76088384 inodes done - 09:42:05: process known inodes and inode discovery - 0 of 76088384 inodes done doubling cache size to 2097152 - 09:57:05: process known inodes and inode discovery - 0 of 76088384 inodes done - 10:12:05: process known inodes and inode discovery - 0 of 76088384 inodes done - 10:27:05: process known inodes and inode discovery - 0 of 76088384 inodes done Using the xfsprogs from yum got over this, but it segfaulted. I am going to give it a little bit more time ... 4 more gb of swap left. On 15/10/2013 22:26, Dave Chinner wrote: > On Tue, Oct 15, 2013 at 09:57:47PM +0200, Stefanita Rares Dumitrescu wrote: >> Since i am using centos 5.9, the version of the xfsprogs seems to be >> old, so i cloned the new one from sgi. >> >> I have a machine with 4 gb ram, and 4 gb swap, and it's all been >> eaten up by xfs_repair, and slowed down to a crawl. >> >> the sdc partition is the one being checked. i am all out of memory >> now. 4 gb phys and 4 gb swap all gone. >> >> http://pastebin.ca/2467064 >> >> posted to pastebin for better formatting. >> >> i was using: >> >> [root@kp4 ~]# xfs_repair -o bhash=16384 -o ihash=16384 -o ag_stride=16 \ >>> /dev/sdc >& /tmp/repair.log > > You don't have enough RAM to run threaded prefetching and parallel > AG processing. You'd do better to turn prefetching off entirely with > "-P" if you are having OOM problems. > >> but now i am trying the -m option to see if the memory can be >> limited, so the server doesn't freeze. >> >> [root@kp4 ~]# xfs_repair -m 3072 -o ag_stride=16 /dev/sdc >& /tmp/repair.log >> >> nothing in dmesg either. > > Give it another 10-20GB of swap, and it should be fine. xfs_repair > usually only thrashes swap when you don't have enough of it and it > keeps trying to free memory, paging in pages that are in swap to > free cached objects from them. Most of the memory references that > repair makes are quite local, so when pages are swapped out they > generally aren't needed again for a while except when cache reclaim > kicks in. Hence if you give it enough swap that it can grow without > bounds, then it should still be quite efficient. > > Keep in mind that badly corrupted filesystems require lots more > memory than clean filesystems to check and repair as there is lots > more intermediate state that repair needs to hold in memory about > partially or incompletely referenced objects. Don't be surprised if > the amount of memory needed to repair a badly broken filesystem is > 10-100x the amount of RAM needed to run xfs_repair on the same clean > filesystem.... > > Cheers, > > Dave. > _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: xfs corrupted 2013-10-15 20:26 ` Dave Chinner ` (2 preceding siblings ...) 2013-10-16 14:32 ` Stefanita Rares Dumitrescu @ 2013-10-16 20:52 ` Stefanita Rares Dumitrescu 2013-10-17 18:04 ` Stefanita Rares Dumitrescu 4 siblings, 0 replies; 19+ messages in thread From: Stefanita Rares Dumitrescu @ 2013-10-16 20:52 UTC (permalink / raw) To: Dave Chinner; +Cc: xfs another quick update: after reloading centos 6, i noticed that both arrays were in verifying status, so i stopped xfs_repair to see if the raid array has some inconsistencies, which it did, and repaired. so my note here is that even if the arrays show okay, you should force verify after a power outage. now the array verify has completed, and some errors were fixed, so i am running xfs_repair once more on the broken array. to keep note, i can now write on the array without issues, lag or whatever. On 15/10/2013 22:26, Dave Chinner wrote: > On Tue, Oct 15, 2013 at 09:57:47PM +0200, Stefanita Rares Dumitrescu wrote: >> Since i am using centos 5.9, the version of the xfsprogs seems to be >> old, so i cloned the new one from sgi. >> >> I have a machine with 4 gb ram, and 4 gb swap, and it's all been >> eaten up by xfs_repair, and slowed down to a crawl. >> >> the sdc partition is the one being checked. i am all out of memory >> now. 4 gb phys and 4 gb swap all gone. >> >> http://pastebin.ca/2467064 >> >> posted to pastebin for better formatting. >> >> i was using: >> >> [root@kp4 ~]# xfs_repair -o bhash=16384 -o ihash=16384 -o ag_stride=16 \ >>> /dev/sdc >& /tmp/repair.log > > You don't have enough RAM to run threaded prefetching and parallel > AG processing. You'd do better to turn prefetching off entirely with > "-P" if you are having OOM problems. > >> but now i am trying the -m option to see if the memory can be >> limited, so the server doesn't freeze. >> >> [root@kp4 ~]# xfs_repair -m 3072 -o ag_stride=16 /dev/sdc >& /tmp/repair.log >> >> nothing in dmesg either. > > Give it another 10-20GB of swap, and it should be fine. xfs_repair > usually only thrashes swap when you don't have enough of it and it > keeps trying to free memory, paging in pages that are in swap to > free cached objects from them. Most of the memory references that > repair makes are quite local, so when pages are swapped out they > generally aren't needed again for a while except when cache reclaim > kicks in. Hence if you give it enough swap that it can grow without > bounds, then it should still be quite efficient. > > Keep in mind that badly corrupted filesystems require lots more > memory than clean filesystems to check and repair as there is lots > more intermediate state that repair needs to hold in memory about > partially or incompletely referenced objects. Don't be surprised if > the amount of memory needed to repair a badly broken filesystem is > 10-100x the amount of RAM needed to run xfs_repair on the same clean > filesystem.... > > Cheers, > > Dave. > _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: xfs corrupted 2013-10-15 20:26 ` Dave Chinner ` (3 preceding siblings ...) 2013-10-16 20:52 ` Stefanita Rares Dumitrescu @ 2013-10-17 18:04 ` Stefanita Rares Dumitrescu 4 siblings, 0 replies; 19+ messages in thread From: Stefanita Rares Dumitrescu @ 2013-10-17 18:04 UTC (permalink / raw) To: Dave Chinner; +Cc: xfs Hi guys, I am finished yay! After the array got rechecked and fixed some errors, i ran the xfs_repair and left it overnight, and i came back to a clean system. Thanks for all your support. Now i learned that no matter what the raid card status says, i still need to force another integrity check after a power failure, even if it says all is good. On 15/10/2013 22:26, Dave Chinner wrote: > On Tue, Oct 15, 2013 at 09:57:47PM +0200, Stefanita Rares Dumitrescu wrote: >> Since i am using centos 5.9, the version of the xfsprogs seems to be >> old, so i cloned the new one from sgi. >> >> I have a machine with 4 gb ram, and 4 gb swap, and it's all been >> eaten up by xfs_repair, and slowed down to a crawl. >> >> the sdc partition is the one being checked. i am all out of memory >> now. 4 gb phys and 4 gb swap all gone. >> >> http://pastebin.ca/2467064 >> >> posted to pastebin for better formatting. >> >> i was using: >> >> [root@kp4 ~]# xfs_repair -o bhash=16384 -o ihash=16384 -o ag_stride=16 \ >>> /dev/sdc >& /tmp/repair.log > > You don't have enough RAM to run threaded prefetching and parallel > AG processing. You'd do better to turn prefetching off entirely with > "-P" if you are having OOM problems. > >> but now i am trying the -m option to see if the memory can be >> limited, so the server doesn't freeze. >> >> [root@kp4 ~]# xfs_repair -m 3072 -o ag_stride=16 /dev/sdc >& /tmp/repair.log >> >> nothing in dmesg either. > > Give it another 10-20GB of swap, and it should be fine. xfs_repair > usually only thrashes swap when you don't have enough of it and it > keeps trying to free memory, paging in pages that are in swap to > free cached objects from them. Most of the memory references that > repair makes are quite local, so when pages are swapped out they > generally aren't needed again for a while except when cache reclaim > kicks in. Hence if you give it enough swap that it can grow without > bounds, then it should still be quite efficient. > > Keep in mind that badly corrupted filesystems require lots more > memory than clean filesystems to check and repair as there is lots > more intermediate state that repair needs to hold in memory about > partially or incompletely referenced objects. Don't be surprised if > the amount of memory needed to repair a badly broken filesystem is > 10-100x the amount of RAM needed to run xfs_repair on the same clean > filesystem.... > > Cheers, > > Dave. > _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: xfs corrupted 2013-10-15 19:34 ` Emmanuel Florac 2013-10-15 19:57 ` Stefanita Rares Dumitrescu @ 2013-10-15 20:02 ` Stefanita Rares Dumitrescu 1 sibling, 0 replies; 19+ messages in thread From: Stefanita Rares Dumitrescu @ 2013-10-15 20:02 UTC (permalink / raw) To: Emmanuel Florac; +Cc: xfs -m maxmem Specifies the approximate maximum amount of memory, in megabytes, to use for xfs_repair. xfs_repair has its own internal block cache which will scale out up to the lesser of the process’s virtual address limit or about 75% of the system’s physical RAM. This option overrides these limits. NOTE: These memory limits are only approximate and may use more than the specified limit. I set this at 3 gb limit, but it's at 2.5 gb of swap already used and still going up :/ On 15/10/2013 21:34, Emmanuel Florac wrote: > Le Tue, 15 Oct 2013 20:45:59 +0200 vous écriviez: > >> What worries me is that i see 100 % cpu usage, some 74 % memory usage >> (i have 4 gb ram) but there is no disk activity at all. I was >> thinking that it would be at least some reads if the xfs_repair is >> doing something. > > What does "iostat -mx 5" output looks like? Is there a lot of IO wait? > Or just no activity at all? Nothing in dmesg output? > _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2013-10-17 18:04 UTC | newest] Thread overview: 19+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-10-15 8:41 xfs corrupted katmai 2013-10-15 18:34 ` Emmanuel Florac 2013-10-15 18:45 ` Stefanita Rares Dumitrescu 2013-10-15 19:07 ` Chris Murphy 2013-10-15 19:52 ` Emmanuel Florac 2013-10-15 19:34 ` Emmanuel Florac 2013-10-15 19:57 ` Stefanita Rares Dumitrescu 2013-10-15 20:05 ` Emmanuel Florac 2013-10-15 20:17 ` Stefanita Rares Dumitrescu 2013-10-15 20:18 ` Stefanita Rares Dumitrescu 2013-10-15 20:26 ` Dave Chinner 2013-10-16 12:23 ` Stefanita Rares Dumitrescu 2013-10-16 13:32 ` Stefanita Rares Dumitrescu 2013-10-16 17:33 ` Keith Keller 2013-10-16 22:16 ` Dave Chinner 2013-10-16 14:32 ` Stefanita Rares Dumitrescu 2013-10-16 20:52 ` Stefanita Rares Dumitrescu 2013-10-17 18:04 ` Stefanita Rares Dumitrescu 2013-10-15 20:02 ` Stefanita Rares Dumitrescu
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox