From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from cuda.sgi.com (cuda3.sgi.com [192.48.176.15]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id o436li6G138376 for ; Mon, 3 May 2010 01:47:45 -0500 Received: from mailsrv14.zmi.at (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 2D8CA1B06074 for ; Sun, 2 May 2010 23:49:49 -0700 (PDT) Received: from mailsrv14.zmi.at (mailsrv1.zmi.at [212.69.164.54]) by cuda.sgi.com with ESMTP id M7WuN8CD1DHF54Qy for ; Sun, 02 May 2010 23:49:49 -0700 (PDT) Received: from mailsrv.i.zmi.at (h081217106033.dyn.cm.kabsi.at [81.217.106.33]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client CN "mailsrv2.i.zmi.at", Issuer "power4u.zmi.at" (not verified)) by mailsrv14.zmi.at (Postfix) with ESMTPSA id 57769800187 for ; Mon, 3 May 2010 08:49:48 +0200 (CEST) Received: from saturn.localnet (saturn.i.zmi.at [10.72.27.2]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mailsrv.i.zmi.at (Postfix) with ESMTPSA id 1A76D83C823 for ; Mon, 3 May 2010 08:49:48 +0200 (CEST) From: Michael Monnerie Subject: Re: xfs_fsr question for improvement Date: Mon, 3 May 2010 08:49:43 +0200 References: <201004161043.11243@zmi.at> <20100417012415.GE2493@dastard> In-Reply-To: <20100417012415.GE2493@dastard> MIME-Version: 1.0 Message-Id: <201005030849.47591@zmi.at> List-Id: XFS Filesystem from SGI List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: multipart/mixed; boundary="===============3095642776653278884==" Sender: xfs-bounces@oss.sgi.com Errors-To: xfs-bounces@oss.sgi.com To: xfs@oss.sgi.com --===============3095642776653278884== Content-Type: multipart/signed; boundary="nextPart15342380.VbRfVO0Erk"; protocol="application/pgp-signature"; micalg=pgp-sha1 Content-Transfer-Encoding: 7bit --nextPart15342380.VbRfVO0Erk Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable On Samstag, 17. April 2010 Dave Chinner wrote: > They have thousands of extents in them and they are all between > 8-10GB in size, and IO from my VMs are stiall capable of saturating > the disks backing these files. While I'd normally consider these > files fragmented and candidates for running fsr on tme, the number > of extents is not actually a performance limiting factor and so > there's no point in defragmenting them. Especially as that requires > shutting down the VMs... I personally care less about file fragmentation than about=20 metadata/inode/directory fragmentation. This server gets accesses from=20 numerous people, # time find /mountpoint/ -inum 107901420 /mountpoint/some/dir/ectory/path/x.iso real 7m50.732s user 0m0.152s sys 0m2.376s It took nearly 8 minutes to search through that mount point, which is=20 6TB big on a RAID-5 striped over 7 2TB disks, so search speed should be=20 high. Especially as there are only 765.000 files on that disk: =46ilesystem Inodes IUsed IFree IUse% /mountpoint 1258291200 765659 1257525541 1% Wouldn't you say an 8 minutes search over just 765.000 files is slow,=20 even when only using 7x 2TB 7200rpm disks in RAID-5? > > Would it be possible xfs_fsr defrags the meta data in a way that > > they are all together so seeks are faster? >=20 > It's not related to fsr because fsr does not defragment metadata. > Some metadata cannot be defragmented (e.g. inodes cannot be moved), > some metadata cannot be manipulated directly (e.g. free space > btrees), and some is just difficult to do (e.g. directory > defragmentation) so hasn't ever been done. I see. On this particular server I know it would be good for performance=20 to have the metadata defrag'ed, but that's not the aim of xfs_fsr. But maybe some developer is bored once and finds a way to optimize the=20 search&find of files on an aged filesystem, i.e. metadata defrag :-) I tried this two times: # time find /mountpoint/ -inum 107901420 real 8m17.316s user 0m0.148s=20 sys 0m1.964s=20 # time find /mountpoint/ -inum 107901420 real 0m30.113s user 0m0.540s=20 sys 0m9.813s=20 Caching helps the 2nd time :-) =20 > > Currently, when I do "find /this_big_fs -inum 1234", it takes > > *ages* for a run, while there are not so many files on it: > > # iostat -kx 5 555 > > Device: r/s rkB/s avgrq-sz avgqu-sz await svctm=20 > > %util xvdb 23,20 92,80 8,00 0,42 15,28=20 > > 18,17 42,16 xvdc 20,20 84,00 8,32 0,57 =20 > > 28,40 28,36 57,28 >=20 > Well, it's not XFS's fault that each read IO is taking 20-30ms. You > can only do 30-50 IOs a second per drive at that rate, so: >=20 > [...] >=20 > > So I get 43 reads/second at 100% utilization. Well I can see up to >=20 > This is right on the money - it's going as fast a your (slow) RAID-5 > volume will allow it to.... >=20 > > 150r/s, but still that's no "wow". A single run to find an inode > > takes a very long time. >=20 > Raid 5/6 generally provides the same IOPS performance as a single > spindle, regardless of the width of the RAID stripe. A 2TB sata > drive might be able to do 150-200 IOPS, so a RAID5 array made up of > these drives will tend to max out at roughly the same.... Running xfs_fsr, I can see up to 1200r+1200w=3D2400I/Os per second: Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq- sz avgqu-sz await svctm %util xvdc 0,00 0,00 0,00 1191,42 0,00 52320,16 =20 87,83 121,23 96,77 0,71 84,63 xvde 0,00 0,00 1226,35 0,00 52324,15 0,00 =20 85,33 0,77 0,62 0,13 15,33 But on average it's about 600-700 read plus writes per second, so=20 1200-1400 IOPS.=20 Both "disks" are 2TB LVM volumes on the same raidset, I just had to=20 split it as XEN doesn't allow to create >2TB volumes. So, the badly slow I/O I see during "find" are not happening during fsr.=20 How can that be? I'm just running another "find" on a fresh remounted xfs, and I can see=20 the reads are happening on 2 of the 3 2TB volumes parallel: Device: r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await =20 svctm %util xvdb 103,20 0,00 476,80 0,00 9,24 0,46 =20 4,52 4,50 46,40 xvdc 97,80 0,00 455,20 0,00 9,31 0,52 =20 5,29 5,30 51,84 When I created that XFS, I took two 2TB partitions, did pvcreate,=20 vgcreate and lvcreate. Could it be that lvcreate automatically thought=20 it should do a RAID-0? Because all reads are equally split between the=20 two volumes. After a while, I added the 3rd 2TB volume, and I can't see=20 that behaviour there. So maybe this is the source of all evil. BTW: I changed mount options "atime,diratime" to "relatime,reldiratime"=20 now and "find" runtime went from 8 minutes down to 7m14s. =2D-=20 mit freundlichen Gr=FCssen, Michael Monnerie, Ing. BSc it-management Internet Services http://proteger.at [gesprochen: Prot-e-schee] Tel: 0660 / 415 65 31 // Wir haben im Moment zwei H=E4user zu verkaufen: // http://zmi.at/langegg/ // http://zmi.at/haus2009/ --nextPart15342380.VbRfVO0Erk Content-Type: application/pgp-signature; name=signature.asc Content-Description: This is a digitally signed message part. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.12 (GNU/Linux) iEYEABECAAYFAkvecgsACgkQzhSR9xwSCbSnEwCg36nlQTOc1VcH55khyOmjHQqx SLcAoNuGv6TZjgAx5rxovG5apfwZsdwq =EGIU -----END PGP SIGNATURE----- --nextPart15342380.VbRfVO0Erk-- --===============3095642776653278884== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs --===============3095642776653278884==--