All of lore.kernel.org
 help / color / mirror / Atom feed
* ReiserFS v3 + millions of files?
@ 2003-10-27  3:46 Dan Oglesby
  2003-10-27  6:55 ` Hans Reiser
                   ` (3 more replies)
  0 siblings, 4 replies; 16+ messages in thread
From: Dan Oglesby @ 2003-10-27  3:46 UTC (permalink / raw)
  To: reiserfs-list

Greetings...

Long time ReiserFS user, first time I've had a problem.  Signed up for the 
mailing list last week, and was surprised to see so little traffic (might be 
a good thing?).

I'm running Red Hat 7.3 using a Red Hat 2.4.20 kernel.  The system has a 
RAID-5 array via 3Ware 7500 controller, and three Western Digital 120GB 
"Special Edition" hard drives.  The array is one filesystem, ReiserFS.  The 
operating system, swap, and other files are stored on a hard drive that is on 
the primary IDE controller off of the motherboard.

The system is a single board computer, with a P4 3.06 GHz hyperthreaded 
processor (kernel is SMP enabled), 512MB of RAM, and contains a mix of 
ReiserFS and EXT2 filesystems on the primary drive (ReiserFS only on the 
array).  No NFS.

The array is used to store what will basically amount to more than one million 
files with an average size of sixty kilobytes.

During simulations for file writes, I'm seeing write performance begin to drop 
dramatically after 800,000 files have been stored on the filesystem.

The filesystem is being mounted with the following options:  
defaults,notail,noatime,nodiratime

The filesystem was created with default options, basically a "mkreiserfs /dev/
sda1".

Is this behavior I should expect from ReiserFS v3?

This week I will be switching from a Red Hat kernel to a vanilla kernel (from 
kernel.org), first the latest 2.4 kernel, then the latest 2.6 kernel.  After 
that...  I dunno.

Help?

--Dan Oglesby


^ permalink raw reply	[flat|nested] 16+ messages in thread
* Re: ReiserFS v3 + millions of files?
@ 2003-10-28 14:50 Dan Oglesby
  0 siblings, 0 replies; 16+ messages in thread
From: Dan Oglesby @ 2003-10-28 14:50 UTC (permalink / raw)
  To: Reiserfs mail-list

> Dan Oglesby writes:
>  > On Monday 27 October 2003 5:42 am, you wrote:
>  > > Dan Oglesby writes:
>  > >  > Greetings...
>  > >  >
>  > >  > Long time ReiserFS user, first time I've had a problem.  Signed up for
>  > >  > the mailing list last week, and was surprised to see so little traffic
>  > >  > (might be a good thing?).
>  > >  >
>  > >  > I'm running Red Hat 7.3 using a Red Hat 2.4.20 kernel.  The system has a
>  > >  > RAID-5 array via 3Ware 7500 controller, and three Western Digital 120GB
>  > >  > "Special Edition" hard drives.  The array is one filesystem, ReiserFS. 
>  > >  > The operating system, swap, and other files are stored on a hard drive
>  > >  > that is on the primary IDE controller off of the motherboard.
>  > >  >
>  > >  > The system is a single board computer, with a P4 3.06 GHz hyperthreaded
>  > >  > processor (kernel is SMP enabled), 512MB of RAM, and contains a mix of
>  > >  > ReiserFS and EXT2 filesystems on the primary drive (ReiserFS only on the
>  > >  > array).  No NFS.
>  > >  >
>  > >  > The array is used to store what will basically amount to more than one
>  > >  > million files with an average size of sixty kilobytes.
>  > >  >
>  > >  > During simulations for file writes, I'm seeing write performance begin
>  > >  > to drop dramatically after 800,000 files have been stored on the
>  > >  > filesystem.
>  > >
>  > > Reiserfs (both v3 and v4) stores directory entries (names within
>  > > directory) sorted by a hash of the file name. If files are created in
>  > > the "random" order, that is, if hashes of names aren't more or less
>  > > monotonic, reiserfs will have to modify the same block many times during
>  > > insertion of large number of files. As blocks with names stop fitting
>  > > into memory this means that the same block has to be fetched many times
>  > > from the disk.
>  > >
>  > > To confirm that this is really what the cause of your problem, please
>  > > answer following questions:
>  > >
>  > > 1. are all your files in the same directory?
>  > 
>  > For now, yes.
>  > 
>  > > 2. how names of files are generated?
>  > 
>  > Filenames are very long, based on site ID, internal codes, date and time.  A 
>  > generic example would be:
> 
> [I would prefer to have this discussion continued on the our mailing
> list (Reiserfs mail-list <Reiserfs-List@Namesys.COM>), so if you don't
> object, please CC reply there.]

Sorry about that.  I must have missed the headers on one of my replies.

> 
>  > 
>  > 		siteID.1234.4321.20031028123456789.ext
> 
> I wasn't precise enough in forming the question. How file names are
> changing through time? That is, how these sequences of digits above
> depend on time? I can guess that "20031028" is a date, but what about
> other parts?
> 

Basically, the siteID doesn't change, and I don't believe the next two 
sections ("1234.4321" in the example above) don't change much either. 
The only part of the filename that changes constantly is the date/time 
section, just before the end.

> Can you modify file name patterns so that name's initial prefix
> increases (in lexicographical order) through time? Like this:
> 
> YYYYMMDDHHmmss.sequential-no.siteID.rest.ext
> 
> this should get rid of, or at least alleviate the problem you have
> described.
> 

I don't see why that would be a problem.  I'll present this option to 
the developers today, and will hopefully be testing very soon.

>  > 
>  > Most files are that size or several characters longer.
>  > 	
>  > > 3. what hash are using on reiserfs (default is r5).
>  > >
>  > 
>  > Default (r5).
>  > 
>  > > Reiserfs has another problem due to its limited capability of handling
>  > > hash collisions in file names. Reiser4 scales much better in this
>  > > respect, and generally, works fine with scores of millions of files in
>  > > one directory.
>  > >
>  > 
>  > Sounds like you guys/gals are going to have another person willing to run the 
>  > current incarnation of Reiser4 on a test machine in the very near future.
> 
> Reiser4 is not yet ready for production. Only use it to manipulate data
> that can be recovered by other mean

I know Reiser4 isn't ready for production, but I wouldn't mind checking 
out the performance differences on a test machine.

Thanks for the info...  I'll see what happens when I apply your theory 
to one of our test machines.

--Dan


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2003-10-30  7:33 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-10-27  3:46 ReiserFS v3 + millions of files? Dan Oglesby
2003-10-27  6:55 ` Hans Reiser
2003-10-27  9:38 ` Hans Reiser
2003-10-27 11:42 ` Nikita Danilov
2003-10-27 16:03   ` Object Oriented FS darren
2003-10-27 18:23     ` Hans Reiser
2003-10-27 18:43       ` Mike Young
2003-10-27 19:01         ` Hans Reiser
2003-10-27 18:56       ` Andreas Dilger
2003-10-28  1:40         ` Steven Cole
2003-10-28  4:46     ` lrc1
2003-10-29  6:03 ` ReiserFS v3 + millions of files? Todd Lyons
2003-10-29  8:44   ` Hans Reiser
2003-10-30  6:04     ` Todd Lyons
2003-10-30  7:33       ` Andreas Dilger
  -- strict thread matches above, loose matches on Subject: below --
2003-10-28 14:50 Dan Oglesby

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.