Re: xfs_fsr question for improvement

From: Michael Monnerie <michael.monnerie@is.it-management.at>
To: xfs@oss.sgi.com
Subject: Re: xfs_fsr question for improvement
Date: Mon, 3 May 2010 08:49:43 +0200	[thread overview]
Message-ID: <201005030849.47591@zmi.at> (raw)
In-Reply-To: <20100417012415.GE2493@dastard>

[-- Attachment #1.1: Type: Text/Plain, Size: 5208 bytes --]

On Samstag, 17. April 2010 Dave Chinner wrote:
> They have thousands of extents in them and they are all between
> 8-10GB in size, and IO from my VMs are stiall capable of saturating
> the disks backing these files. While I'd normally consider these
> files fragmented and candidates for running fsr on tme, the number
> of extents is not actually a performance limiting factor and so
> there's no point in defragmenting them. Especially as that requires
> shutting down the VMs...

I personally care less about file fragmentation than about 
metadata/inode/directory fragmentation. This server gets accesses from 
numerous people,

# time find /mountpoint/ -inum 107901420
/mountpoint/some/dir/ectory/path/x.iso

real    7m50.732s
user    0m0.152s
sys     0m2.376s

It took nearly 8 minutes to search through that mount point, which is 
6TB big on a RAID-5 striped over 7 2TB disks, so search speed should be 
high. Especially as there are only 765.000 files on that disk:
Filesystem            Inodes   IUsed   IFree IUse%
/mountpoint           1258291200  765659 1257525541    1%

Wouldn't you say an 8 minutes search over just 765.000 files is slow, 
even when only using 7x 2TB 7200rpm disks in RAID-5?

> > Would it be possible xfs_fsr defrags the meta data in a way that
> > they are all together so seeks are faster?
> 
> It's not related to fsr because fsr does not defragment metadata.
> Some metadata cannot be defragmented (e.g. inodes cannot be moved),
> some metadata cannot be manipulated directly (e.g. free space
> btrees), and some is just difficult to do (e.g. directory
> defragmentation) so hasn't ever been done.

I see. On this particular server I know it would be good for performance 
to have the metadata defrag'ed, but that's not the aim of xfs_fsr.
But maybe some developer is bored once and finds a way to optimize the 
search&find of files on an aged filesystem, i.e. metadata defrag :-)

I tried this two times:
# time find /mountpoint/ -inum 107901420
real    8m17.316s
user    0m0.148s 
sys     0m1.964s 

# time find /mountpoint/ -inum 107901420
real    0m30.113s
user    0m0.540s 
sys     0m9.813s 

Caching helps the 2nd time :-)

> > Currently, when I do "find /this_big_fs -inum 1234", it takes
> > *ages* for a run, while there are not so many files on it:
> > # iostat -kx 5 555
> > Device:         r/s     rkB/s    avgrq-sz avgqu-sz   await  svctm 
> > %util xvdb              23,20    92,80     8,00     0,42   15,28 
> > 18,17  42,16 xvdc              20,20    84,00     8,32     0,57  
> > 28,40  28,36  57,28
> 
> Well, it's not XFS's fault that each read IO is taking 20-30ms. You
> can only do 30-50 IOs a second per drive at that rate, so:
> 
> [...]
> 
> > So I get 43 reads/second at 100% utilization. Well I can see up to
> 
> This is right on the money - it's going as fast a your (slow) RAID-5
> volume will allow it to....
> 
> > 150r/s, but still that's no "wow". A single run to find an inode
> > takes a very long time.
> 
> Raid 5/6 generally provides the same IOPS performance as a single
> spindle, regardless of the width of the RAID stripe. A 2TB sata
> drive might be able to do 150-200 IOPS, so a RAID5 array made up of
> these drives will tend to max out at roughly the same....

Running xfs_fsr, I can see up to 1200r+1200w=2400I/Os per second:

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-
sz avgqu-sz   await  svctm  %util
xvdc              0,00     0,00    0,00 1191,42     0,00 52320,16    
87,83   121,23   96,77   0,71  84,63
xvde              0,00     0,00 1226,35    0,00 52324,15     0,00    
85,33     0,77    0,62   0,13  15,33

But on average it's about 600-700 read plus writes per second, so 
1200-1400 IOPS. 
Both "disks" are 2TB LVM volumes on the same raidset, I just had to 
split it as XEN doesn't allow to create >2TB volumes.

So, the badly slow I/O I see during "find" are not happening during fsr. 
How can that be?

I'm just running another "find" on a fresh remounted xfs, and I can see 
the reads are happening on 2 of the 3 2TB volumes parallel:
Device:         r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await  
svctm  %util
xvdb              103,20    0,00   476,80     0,00     9,24     0,46    
4,52   4,50  46,40
xvdc              97,80    0,00   455,20     0,00     9,31     0,52    
5,29   5,30  51,84

When I created that XFS, I took two 2TB partitions, did pvcreate, 
vgcreate and lvcreate. Could it be that lvcreate automatically thought 
it should do a RAID-0? Because all reads are equally split between the 
two volumes. After a while, I added the 3rd 2TB volume, and I can't see 
that behaviour there. So maybe this is the source of all evil.

BTW: I changed mount options "atime,diratime" to "relatime,reldiratime" 
now and "find" runtime went from 8 minutes down to 7m14s.

-- 
mit freundlichen Grüssen,
Michael Monnerie, Ing. BSc

it-management Internet Services
http://proteger.at [gesprochen: Prot-e-schee]
Tel: 0660 / 415 65 31

// Wir haben im Moment zwei Häuser zu verkaufen:
// http://zmi.at/langegg/
// http://zmi.at/haus2009/

[-- Attachment #1.2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs