From mboxrd@z Thu Jan 1 00:00:00 1970 From: Hans Reiser Subject: Re: Calling stat with millions of files Date: Tue, 08 Jun 2004 09:47:42 -0700 Message-ID: <40C5EDAE.6000707@namesys.com> References: <40C5E92D.2090101@namesys.com> <1086712717.10973.122.camel@watt.suse.com> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Return-path: list-help: list-unsubscribe: list-post: Errors-To: flx@namesys.com In-Reply-To: <1086712717.10973.122.camel@watt.suse.com> List-Id: Content-Type: text/plain; charset="us-ascii"; format="flowed" To: Chris Mason Cc: Ross Skaliotis , reiserfs-list@namesys.com Chris Mason wrote: >On Tue, 2004-06-08 at 12:28, Hans Reiser wrote: > > >>Ross Skaliotis wrote: >> >> >> >>>We have a backup system (backupPC) that is responsible for backing up >>>millions of files and holding over 2 TB of data. The backup system saves >>>space by creating hard links where it can. This actually reduces the total >>>size down to 200 GB, however there are still millions of files/hard links. >>> >>>What does this have to do with reiserfs? Well, the partition runs reiserfs >>>3.6. Over the course of a few months (and the addition of millions more >>>files/hard links) the performance of the filesystem has become painfully >>>slow. >>> >>>In a directory of ~1000 files, running 'ls -l' or 'stat *' takes 45 >>>seconds to complete (assuming nothing is previously cached). We are using >>>the r5 hash, and mount the filesystem with noatime. No one directory has >>>(or will ever have) more than about 2000 files and hard links inside. >>> >>> >>> >Mount with -o nodiratime as well. > > > >>>Is this performance decrease when calling stat to be expected with >>>increasing millions of files/hard links? Would XFS or another filesystem >>>get around this somehow? I'd rather not use reiserfs4, as this is a >>>production backup server and needs to be as stable as possible. >>> >>>Thanks very much for your help, >>> >>> >>> >>hardlinks destroy locality of reference for stat data, this is probably >>your problem. >> >>Probably the readahead for the directory is broken by hard links also. >> >>Likely could be optimized but nobody else is complaining about it so I >>can't really afford to do it just for you, sorry. >> >> >> >> > >You'll get better results with the new block allocator in 2.6.7-rcX-mm, >but in the end the stat information for the file isn't horribly close to >the directory entries, and performance won't be perfect. > >Hans, I thought reiser4 was going to be good at this kind of thing? > >-chris > > > > > > what in reiser4 optimizes accesses to hard links to files whose stat data is stored in other directories? Maybe the stat data being stored near other stat data instead of near file bodies will help,. Hmmm. Could be, have to try it to see.