All of lore.kernel.org
 help / color / mirror / Atom feed
From: Hans Reiser <reiser@namesys.com>
To: Chris Mason <mason@suse.com>
Cc: Ross Skaliotis <mross@rs-net.org>, reiserfs-list@namesys.com
Subject: Re: Calling stat with millions of files
Date: Tue, 08 Jun 2004 09:47:42 -0700	[thread overview]
Message-ID: <40C5EDAE.6000707@namesys.com> (raw)
In-Reply-To: <1086712717.10973.122.camel@watt.suse.com>

Chris Mason wrote:

>On Tue, 2004-06-08 at 12:28, Hans Reiser wrote:
>  
>
>>Ross Skaliotis wrote:
>>
>>    
>>
>>>We have a backup system (backupPC) that is responsible for backing up
>>>millions of files and holding over 2 TB of data. The backup system saves
>>>space by creating hard links where it can. This actually reduces the total
>>>size down to 200 GB, however there are still millions of files/hard links.
>>>
>>>What does this have to do with reiserfs? Well, the partition runs reiserfs
>>>3.6. Over the course of a few months (and the addition of millions more
>>>files/hard links) the performance of the filesystem has become painfully
>>>slow.
>>>
>>>In a directory of ~1000 files, running 'ls -l' or 'stat *' takes 45
>>>seconds to complete (assuming nothing is previously cached). We are using
>>>the r5 hash, and mount the filesystem with noatime. No one directory has
>>>(or will ever have) more than about 2000 files and hard links inside.
>>>
>>>      
>>>
>Mount with -o nodiratime as well.
>
>  
>
>>>Is this performance decrease when calling stat to be expected with
>>>increasing millions of files/hard links? Would XFS or another filesystem
>>>get around this somehow? I'd rather not use reiserfs4, as this is a
>>>production backup server and needs to be as stable as possible.
>>>
>>>Thanks very much for your help,
>>>
>>>      
>>>
>>hardlinks destroy locality of reference for stat data, this is probably 
>>your problem.
>>
>>Probably the readahead for the directory is broken by hard links also.
>>
>>Likely could be optimized but nobody else is complaining about it so I 
>>can't really afford to do it just for you, sorry.
>>
>>
>>    
>>
>
>You'll get better results with the new block allocator in 2.6.7-rcX-mm,
>but in the end the stat information for the file isn't horribly close to
>the directory entries, and performance won't be perfect.
>
>Hans, I thought reiser4 was going to be good at this kind of thing?
>
>-chris
>
>
>
>
>  
>
what in reiser4 optimizes accesses to hard links to files whose stat 
data is stored in other directories?  Maybe the stat data being stored 
near other stat data instead of near file bodies will help,.  Hmmm.  
Could be, have to try it to see.

  reply	other threads:[~2004-06-08 16:47 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-06-07 21:54 Calling stat with millions of files Ross Skaliotis
2004-06-08 10:03 ` Vladimir Saveliev
2004-06-08 12:00   ` Heinz-Josef Claes
2004-06-08 16:28 ` Hans Reiser
2004-06-08 16:38   ` Chris Mason
2004-06-08 16:47     ` Hans Reiser [this message]
2004-06-08 16:51       ` Chris Mason
2004-06-08 16:58         ` Hans Reiser
2004-06-08 17:57         ` Ross Skaliotis
2004-06-08 22:10           ` Hans Reiser
2004-06-08 22:05         ` Carl-Daniel Hailfinger
2004-06-08 18:58 ` Mike Benoit
2004-06-08 20:29   ` Henning Westerholt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=40C5EDAE.6000707@namesys.com \
    --to=reiser@namesys.com \
    --cc=mason@suse.com \
    --cc=mross@rs-net.org \
    --cc=reiserfs-list@namesys.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.