Re: bug: xfs_repair becomes very slow when file system has a large sparse file

From: Christoph Hellwig <hch@infradead.org>
To: Dave Chinner <david@fromorbit.com>
Cc: xfs@oss.sgi.com, Joe Landman <landman@scalableinformatics.com>
Subject: Re: bug: xfs_repair becomes very slow when file system has a large sparse file
Date: Wed, 7 Sep 2011 06:43:01 -0400	[thread overview]
Message-ID: <20110907104301.GA31971@infradead.org> (raw)
In-Reply-To: <20110820020004.GG32358@dastard>

Joe,

do you have any of the updates Dave asked for?

On Sat, Aug 20, 2011 at 12:00:04PM +1000, Dave Chinner wrote:
> On Fri, Aug 19, 2011 at 08:38:53PM -0400, Joe Landman wrote:
> > On 8/19/2011 8:26 PM, Dave Chinner wrote:
> > >On Fri, Aug 19, 2011 at 12:37:05PM -0400, Joe Landman wrote:
> > >>(If you prefer we file this on a bug reporting system, please let me
> > >>know where and I'll do this).
> > >>
> > >>Scenario:  xfs_repair being run against an about 17TB volume,
> > >>containing 1 large sparse file.  Logical size of 7 PB, actual size,
> > >>a few hundred GB.
> > >>
> > >>Metadata:  Kernel = 2.6.32.41, 2.6.39.4, and others. Xfstools 3.1.5.
> > >>Hardware RAID ~17TB LUN.  Base OS: Centos 5.6 + updates + updated
> > >>xfs tools + our kernels.  Using external journal on a different
> > >>device
> > >>
> > >>What we observe:
> > >>
> > >>Running xfs_repair
> > >>
> > >>	xfs_repair -l /dev/md2 -vv /dev/sdd2
> > >
> > >can you post the actual output of xfs_repair?
> > 
> > [root@jr4-2 ~]# xfs_repair -l /dev/md2 -vv /dev/sdd2
> > Phase 1 - find and verify superblock...
> > - max_mem = 37094400, icount = 1346752, imem = 5260, dblock =
> > 4391112384, dmem = 2144097
> > - block cache size set to 4361880 entries
> > Phase 2 - using external log on /dev/md2
> > - zero log...
> > zero_log: head block 126232 tail block 126232
> > - scan filesystem freespace and inode maps...
> > agf_freeblks 11726908, counted 11726792 in ag 1
> > sb_ifree 2366, counted 2364
> > sb_fdblocks 2111548832, counted 2111548716
> > - found root inode chunk
> > libxfs_bcache: 0x8804c0
> > Max supported entries = 4361880
> > Max utilized entries = 4474
> > Active entries = 4474
> > Hash table size = 545235
> > Hits = 0
> > Misses = 4474
> > Hit ratio = 0.00
> > MRU 0 entries = 4474 (100%)
> > MRU 1 entries = 0 ( 0%)
> > MRU 2 entries = 0 ( 0%)
> > MRU 3 entries = 0 ( 0%)
> > MRU 4 entries = 0 ( 0%)
> > MRU 5 entries = 0 ( 0%)
> > MRU 6 entries = 0 ( 0%)
> > MRU 7 entries = 0 ( 0%)
> > MRU 8 entries = 0 ( 0%)
> > MRU 9 entries = 0 ( 0%)
> > MRU 10 entries = 0 ( 0%)
> > MRU 11 entries = 0 ( 0%)
> > MRU 12 entries = 0 ( 0%)
> > MRU 13 entries = 0 ( 0%)
> > MRU 14 entries = 0 ( 0%)
> > MRU 15 entries = 0 ( 0%)
> > Hash buckets with 0 entries 541170 ( 0%)
> > Hash buckets with 1 entries 3765 ( 84%)
> > Hash buckets with 2 entries 242 ( 10%)
> > Hash buckets with 3 entries 15 ( 1%)
> > Hash buckets with 4 entries 36 ( 3%)
> > Hash buckets with 5 entries 6 ( 0%)
> > Hash buckets with 6 entries 1 ( 0%)
> > Phase 3 - for each AG...
> > - scan and clear agi unlinked lists...
> > - process known inodes and perform inode discovery...
> > - agno = 0
> > bad magic number 0xc88 on inode 5034047
> > bad version number 0x40 on inode 5034047
> > bad inode format in inode 5034047
> > correcting nblocks for inode 5034046, was 185195 - counted 0
> > bad magic number 0xc88 on inode 5034047, resetting magic number
> > bad version number 0x40 on inode 5034047, resetting version number
> > bad inode format in inode 5034047
> > cleared inode 5034047
> 
> That doesn't look good - something has trashed an inode cluster by
> the look of it. Was this why you ran xfs_repair?
> 
> FWIW, do you know what the inode number of the large file was? I'm
> wondering if it was in the same cluster as the above inode and so
> was corrupted in some way that cause repair to head off into lala
> land....
> 
> > >What is the CPU usage when this happens? How much memory do you
> > 
> > Very low.  The machine is effectively idle, user load of 0.01 or so.
> 
> OK, so repair wasn't burning up an entire CPU walking/searching
> lists?
> 
> > >>This isn't a 7PB file system, its a 100TB file system across 3
> > >>machines, roughly 17TB per brick or OSS.  The Gau-00000.rwf is
> > >>obviously a sparse file, as could be seen with an ls -alsF
> > >
> > >What does du tell you about it?  xfs_io -f -c "stat"<large file>?
> > >xfs_bmap -vp<large file>?
> > 
> > ls -alsF told me it was a few hundred GB.  Du gave a similar number.
> 
> Ok - the other commands, however, tell me more than just the disk
> blocks used - they also tell me how many extents the file has and
> how they were laid out, which is what I really need to know about
> that sparse file. It will also help me recreate a file with a
> similar layout to see if xfs_repair chokes on it here, or whether it
> was something specific to a corruption encountered....
> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
---end quoted text---

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs