All of lore.kernel.org
 help / color / mirror / Atom feed
* ReiserFS v3 + millions of files?
@ 2003-10-27  3:46 Dan Oglesby
  2003-10-27  6:55 ` Hans Reiser
                   ` (3 more replies)
  0 siblings, 4 replies; 9+ messages in thread
From: Dan Oglesby @ 2003-10-27  3:46 UTC (permalink / raw)
  To: reiserfs-list

Greetings...

Long time ReiserFS user, first time I've had a problem.  Signed up for the 
mailing list last week, and was surprised to see so little traffic (might be 
a good thing?).

I'm running Red Hat 7.3 using a Red Hat 2.4.20 kernel.  The system has a 
RAID-5 array via 3Ware 7500 controller, and three Western Digital 120GB 
"Special Edition" hard drives.  The array is one filesystem, ReiserFS.  The 
operating system, swap, and other files are stored on a hard drive that is on 
the primary IDE controller off of the motherboard.

The system is a single board computer, with a P4 3.06 GHz hyperthreaded 
processor (kernel is SMP enabled), 512MB of RAM, and contains a mix of 
ReiserFS and EXT2 filesystems on the primary drive (ReiserFS only on the 
array).  No NFS.

The array is used to store what will basically amount to more than one million 
files with an average size of sixty kilobytes.

During simulations for file writes, I'm seeing write performance begin to drop 
dramatically after 800,000 files have been stored on the filesystem.

The filesystem is being mounted with the following options:  
defaults,notail,noatime,nodiratime

The filesystem was created with default options, basically a "mkreiserfs /dev/
sda1".

Is this behavior I should expect from ReiserFS v3?

This week I will be switching from a Red Hat kernel to a vanilla kernel (from 
kernel.org), first the latest 2.4 kernel, then the latest 2.6 kernel.  After 
that...  I dunno.

Help?

--Dan Oglesby


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: ReiserFS v3 + millions of files?
  2003-10-27  3:46 Dan Oglesby
@ 2003-10-27  6:55 ` Hans Reiser
  2003-10-27  9:38 ` Hans Reiser
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 9+ messages in thread
From: Hans Reiser @ 2003-10-27  6:55 UTC (permalink / raw)
  To: Dan Oglesby; +Cc: reiserfs-list

Dan Oglesby wrote:

>Greetings...
>
>Long time ReiserFS user, first time I've had a problem.  Signed up for the 
>mailing list last week, and was surprised to see so little traffic (might be 
>a good thing?).
>
>I'm running Red Hat 7.3 using a Red Hat 2.4.20 kernel.  The system has a 
>RAID-5 array via 3Ware 7500 controller, and three Western Digital 120GB 
>"Special Edition" hard drives.  The array is one filesystem, ReiserFS.  The 
>operating system, swap, and other files are stored on a hard drive that is on 
>the primary IDE controller off of the motherboard.
>
>The system is a single board computer, with a P4 3.06 GHz hyperthreaded 
>processor (kernel is SMP enabled), 512MB of RAM, and contains a mix of 
>ReiserFS and EXT2 filesystems on the primary drive (ReiserFS only on the 
>array).  No NFS.
>
>The array is used to store what will basically amount to more than one million 
>files with an average size of sixty kilobytes.
>
>During simulations for file writes, I'm seeing write performance begin to drop 
>dramatically after 800,000 files have been stored on the filesystem.
>
>The filesystem is being mounted with the following options:  
>defaults,notail,noatime,nodiratime
>
>The filesystem was created with default options, basically a "mkreiserfs /dev/
>sda1".
>
>Is this behavior I should expect from ReiserFS v3?
>
>This week I will be switching from a Red Hat kernel to a vanilla kernel (from 
>kernel.org), first the latest 2.4 kernel, then the latest 2.6 kernel.  After 
>that...  I dunno.
>
>Help?
>
>--Dan Oglesby
>
>
>
>  
>
what size directories?

-- 
Hans



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: ReiserFS v3 + millions of files?
  2003-10-27  3:46 Dan Oglesby
  2003-10-27  6:55 ` Hans Reiser
@ 2003-10-27  9:38 ` Hans Reiser
  2003-10-27 11:42 ` Nikita Danilov
  2003-10-29  6:03 ` Todd Lyons
  3 siblings, 0 replies; 9+ messages in thread
From: Hans Reiser @ 2003-10-27  9:38 UTC (permalink / raw)
  To: Dan Oglesby; +Cc: reiserfs-list, demidov

I have unsubstantiated suspicions about VFS/dcache scalability.  It 
would be interesting to try to determine such things as how much of ram 
is consumed by dcache, etc.

Mr. Demidov, how long until sys_reiser4() can be used to create a 
million files sized 60k using lnodes?  It would be an interesting 
benchmarking experiment.....

sys_reiser4() sidesteps dcache....

Hans

Dan Oglesby wrote:

>Greetings...
>
>Long time ReiserFS user, first time I've had a problem.  Signed up for the 
>mailing list last week, and was surprised to see so little traffic (might be 
>a good thing?).
>
>I'm running Red Hat 7.3 using a Red Hat 2.4.20 kernel.  The system has a 
>RAID-5 array via 3Ware 7500 controller, and three Western Digital 120GB 
>"Special Edition" hard drives.  The array is one filesystem, ReiserFS.  The 
>operating system, swap, and other files are stored on a hard drive that is on 
>the primary IDE controller off of the motherboard.
>
>The system is a single board computer, with a P4 3.06 GHz hyperthreaded 
>processor (kernel is SMP enabled), 512MB of RAM, and contains a mix of 
>ReiserFS and EXT2 filesystems on the primary drive (ReiserFS only on the 
>array).  No NFS.
>
>The array is used to store what will basically amount to more than one million 
>files with an average size of sixty kilobytes.
>
>During simulations for file writes, I'm seeing write performance begin to drop 
>dramatically after 800,000 files have been stored on the filesystem.
>
>The filesystem is being mounted with the following options:  
>defaults,notail,noatime,nodiratime
>
>The filesystem was created with default options, basically a "mkreiserfs /dev/
>sda1".
>
>Is this behavior I should expect from ReiserFS v3?
>
>This week I will be switching from a Red Hat kernel to a vanilla kernel (from 
>kernel.org), first the latest 2.4 kernel, then the latest 2.6 kernel.  After 
>that...  I dunno.
>
>Help?
>
>--Dan Oglesby
>
>
>
>  
>


-- 
Hans



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: ReiserFS v3 + millions of files?
  2003-10-27  3:46 Dan Oglesby
  2003-10-27  6:55 ` Hans Reiser
  2003-10-27  9:38 ` Hans Reiser
@ 2003-10-27 11:42 ` Nikita Danilov
  2003-10-29  6:03 ` Todd Lyons
  3 siblings, 0 replies; 9+ messages in thread
From: Nikita Danilov @ 2003-10-27 11:42 UTC (permalink / raw)
  To: Dan Oglesby; +Cc: reiserfs-list

Dan Oglesby writes:
 > Greetings...
 > 
 > Long time ReiserFS user, first time I've had a problem.  Signed up for the 
 > mailing list last week, and was surprised to see so little traffic (might be 
 > a good thing?).
 > 
 > I'm running Red Hat 7.3 using a Red Hat 2.4.20 kernel.  The system has a 
 > RAID-5 array via 3Ware 7500 controller, and three Western Digital 120GB 
 > "Special Edition" hard drives.  The array is one filesystem, ReiserFS.  The 
 > operating system, swap, and other files are stored on a hard drive that is on 
 > the primary IDE controller off of the motherboard.
 > 
 > The system is a single board computer, with a P4 3.06 GHz hyperthreaded 
 > processor (kernel is SMP enabled), 512MB of RAM, and contains a mix of 
 > ReiserFS and EXT2 filesystems on the primary drive (ReiserFS only on the 
 > array).  No NFS.
 > 
 > The array is used to store what will basically amount to more than one million 
 > files with an average size of sixty kilobytes.
 > 
 > During simulations for file writes, I'm seeing write performance begin to drop 
 > dramatically after 800,000 files have been stored on the filesystem.

Reiserfs (both v3 and v4) stores directory entries (names within
directory) sorted by a hash of the file name. If files are created in
the "random" order, that is, if hashes of names aren't more or less
monotonic, reiserfs will have to modify the same block many times during
insertion of large number of files. As blocks with names stop fitting
into memory this means that the same block has to be fetched many times
from the disk.

To confirm that this is really what the cause of your problem, please
answer following questions:

1. are all your files in the same directory?
2. how names of files are generated?
3. what hash are using on reiserfs (default is r5).

Reiserfs has another problem due to its limited capability of handling
hash collisions in file names. Reiser4 scales much better in this
respect, and generally, works fine with scores of millions of files in
one directory.

 > 
 > The filesystem is being mounted with the following options:  
 > defaults,notail,noatime,nodiratime
 > 
 > The filesystem was created with default options, basically a "mkreiserfs /dev/
 > sda1".
 > 
 > Is this behavior I should expect from ReiserFS v3?
 > 
 > This week I will be switching from a Red Hat kernel to a vanilla kernel (from 
 > kernel.org), first the latest 2.4 kernel, then the latest 2.6 kernel.  After 
 > that...  I dunno.
 > 
 > Help?
 > 
 > --Dan Oglesby
 > 

Nikita.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: ReiserFS v3 + millions of files?
@ 2003-10-28 14:50 Dan Oglesby
  0 siblings, 0 replies; 9+ messages in thread
From: Dan Oglesby @ 2003-10-28 14:50 UTC (permalink / raw)
  To: Reiserfs mail-list

> Dan Oglesby writes:
>  > On Monday 27 October 2003 5:42 am, you wrote:
>  > > Dan Oglesby writes:
>  > >  > Greetings...
>  > >  >
>  > >  > Long time ReiserFS user, first time I've had a problem.  Signed up for
>  > >  > the mailing list last week, and was surprised to see so little traffic
>  > >  > (might be a good thing?).
>  > >  >
>  > >  > I'm running Red Hat 7.3 using a Red Hat 2.4.20 kernel.  The system has a
>  > >  > RAID-5 array via 3Ware 7500 controller, and three Western Digital 120GB
>  > >  > "Special Edition" hard drives.  The array is one filesystem, ReiserFS. 
>  > >  > The operating system, swap, and other files are stored on a hard drive
>  > >  > that is on the primary IDE controller off of the motherboard.
>  > >  >
>  > >  > The system is a single board computer, with a P4 3.06 GHz hyperthreaded
>  > >  > processor (kernel is SMP enabled), 512MB of RAM, and contains a mix of
>  > >  > ReiserFS and EXT2 filesystems on the primary drive (ReiserFS only on the
>  > >  > array).  No NFS.
>  > >  >
>  > >  > The array is used to store what will basically amount to more than one
>  > >  > million files with an average size of sixty kilobytes.
>  > >  >
>  > >  > During simulations for file writes, I'm seeing write performance begin
>  > >  > to drop dramatically after 800,000 files have been stored on the
>  > >  > filesystem.
>  > >
>  > > Reiserfs (both v3 and v4) stores directory entries (names within
>  > > directory) sorted by a hash of the file name. If files are created in
>  > > the "random" order, that is, if hashes of names aren't more or less
>  > > monotonic, reiserfs will have to modify the same block many times during
>  > > insertion of large number of files. As blocks with names stop fitting
>  > > into memory this means that the same block has to be fetched many times
>  > > from the disk.
>  > >
>  > > To confirm that this is really what the cause of your problem, please
>  > > answer following questions:
>  > >
>  > > 1. are all your files in the same directory?
>  > 
>  > For now, yes.
>  > 
>  > > 2. how names of files are generated?
>  > 
>  > Filenames are very long, based on site ID, internal codes, date and time.  A 
>  > generic example would be:
> 
> [I would prefer to have this discussion continued on the our mailing
> list (Reiserfs mail-list <Reiserfs-List@Namesys.COM>), so if you don't
> object, please CC reply there.]

Sorry about that.  I must have missed the headers on one of my replies.

> 
>  > 
>  > 		siteID.1234.4321.20031028123456789.ext
> 
> I wasn't precise enough in forming the question. How file names are
> changing through time? That is, how these sequences of digits above
> depend on time? I can guess that "20031028" is a date, but what about
> other parts?
> 

Basically, the siteID doesn't change, and I don't believe the next two 
sections ("1234.4321" in the example above) don't change much either. 
The only part of the filename that changes constantly is the date/time 
section, just before the end.

> Can you modify file name patterns so that name's initial prefix
> increases (in lexicographical order) through time? Like this:
> 
> YYYYMMDDHHmmss.sequential-no.siteID.rest.ext
> 
> this should get rid of, or at least alleviate the problem you have
> described.
> 

I don't see why that would be a problem.  I'll present this option to 
the developers today, and will hopefully be testing very soon.

>  > 
>  > Most files are that size or several characters longer.
>  > 	
>  > > 3. what hash are using on reiserfs (default is r5).
>  > >
>  > 
>  > Default (r5).
>  > 
>  > > Reiserfs has another problem due to its limited capability of handling
>  > > hash collisions in file names. Reiser4 scales much better in this
>  > > respect, and generally, works fine with scores of millions of files in
>  > > one directory.
>  > >
>  > 
>  > Sounds like you guys/gals are going to have another person willing to run the 
>  > current incarnation of Reiser4 on a test machine in the very near future.
> 
> Reiser4 is not yet ready for production. Only use it to manipulate data
> that can be recovered by other mean

I know Reiser4 isn't ready for production, but I wouldn't mind checking 
out the performance differences on a test machine.

Thanks for the info...  I'll see what happens when I apply your theory 
to one of our test machines.

--Dan


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: ReiserFS v3 + millions of files?
  2003-10-27  3:46 Dan Oglesby
                   ` (2 preceding siblings ...)
  2003-10-27 11:42 ` Nikita Danilov
@ 2003-10-29  6:03 ` Todd Lyons
  2003-10-29  8:44   ` Hans Reiser
  3 siblings, 1 reply; 9+ messages in thread
From: Todd Lyons @ 2003-10-29  6:03 UTC (permalink / raw)
  To: reiserfs-list

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Dan Oglesby wanted us to know:

>The array is used to store what will basically amount to more than one million 
>files with an average size of sixty kilobytes.

If I had to make an educated guess, you're doing something similar to
what I'm doing.  That's all you'll get out of me for now :-)

>During simulations for file writes, I'm seeing write performance begin to drop 
>dramatically after 800,000 files have been stored on the filesystem.

Hmmm, I did some tests that I went up to 300,000 files with an average
size of 20Kbytes.  The end application will be targeted at 6.5 million
files with the same average 20Kbyte size.

For my tests, I compared EXT3, XFS, reiser(tails), and reiser(notail).
The results are at http://www.mrball.net/file%20creation%20tests.html

After reading your post, I'm going to repartition and repeat my tests
with much larger filecounts.

>The filesystem is being mounted with the following options:  
>defaults,notail,noatime,nodiratime

I'll repeat with noatime and nodiratime, but my simple little benchmark
opens(creates), writes, and closes.  Very little affect by my
guestimation will be felt by those two.  Maybe I'm wrong though.

>The filesystem was created with default options, basically a "mkreiserfs /dev/
>sda1".

I made it a point of using slow IDE drives for my tests.  They were done
on a 5400 RPM IDE drive.
- -- 
Blue skies...	Todd 	Public key: http://www.mrball.net/todd.asc
 Favourite shell:  bash, though I also like 'init=/usr/bin/emacs'
                                                --Andrew Tridgell
Linux kernel 2.4.22-10mm.2mdk   1 user,  load average: 0.00, 0.03, 0.06
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (GNU/Linux)
Comment: http://www.mrball.net/todd.asc

iD8DBQE/n1hBIBT1264ScBURAhc7AKDkwHfhMXdbkEN14d2FVrwsb1A7YgCgymmN
bFouT/2wnIEA6N3fd4u6q3Q=
=xoll
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: ReiserFS v3 + millions of files?
  2003-10-29  6:03 ` Todd Lyons
@ 2003-10-29  8:44   ` Hans Reiser
  2003-10-30  6:04     ` Todd Lyons
  0 siblings, 1 reply; 9+ messages in thread
From: Hans Reiser @ 2003-10-29  8:44 UTC (permalink / raw)
  To: Todd Lyons; +Cc: reiserfs-list

Todd Lyons wrote:

>
>
> For my tests, I compared EXT3, XFS, reiser(tails), and reiser(notail).
> The results are at http://www.mrball.net/file%20creation%20tests.html
>
>
Is ext3 using htree?

-- 
Hans



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: ReiserFS v3 + millions of files?
  2003-10-29  8:44   ` Hans Reiser
@ 2003-10-30  6:04     ` Todd Lyons
  2003-10-30  7:33       ` Andreas Dilger
  0 siblings, 1 reply; 9+ messages in thread
From: Todd Lyons @ 2003-10-30  6:04 UTC (permalink / raw)
  To: Hans Reiser; +Cc: reiserfs-list

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hans Reiser wanted us to know:

>>For my tests, I compared EXT3, XFS, reiser(tails), and reiser(notail).
>>The results are at http://www.mrball.net/file%20creation%20tests.html
>Is ext3 using htree?

You know, I don't know the answer to that.  I would guess that it's the
default compilation options, so if you say that it uses that by default,
I'll believe you.  I tried grepping the binary but didn't find anything
that indicated type of structure.

I found this in namei.c.  Dunno if it helps:
 *  Directory entry file type support and forward compatibility hooks
 *      for B-tree directories by Theodore Ts'o (tytso@mit.edu), 1998

To me that doesn't really tell anything though.

Anyway, I fired off an instance of my test program that created 3
million files and it finished in about 26 minutes.  But then I found
that I had filled the partition so I shortened the file size a little
and restarted it.  It was running when I left and I'll analyze the logs
tomorrow.  BTW, that was on a machine with 5400 RPM 40 GB HD and 128
Megs of RAM (trying to starve the system for resources to get rid of
effects of caching and buffering).
- -- 
Blue skies...	Todd 	  Public key: http://www.mrball.net/todd.asc
    They dont need to adjust their pricing, they just need to 
   lobby for new laws to protect their flawed business models. 
              Oh wait, they just did that.         --Dan Hollis
Linux kernel 2.4.22-10mm.2mdk   1 user,  load average: 0.02, 0.01, 0.00
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (GNU/Linux)
Comment: http://www.mrball.net/todd.asc

iD8DBQE/oKn4IBT1264ScBURAgDzAJwKxhDmsx4eEsgZk5uQtT3G0yproACfVb4o
LlfB7Z4tO3cNQJo4AqOkl6g=
=zWHs
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: ReiserFS v3 + millions of files?
  2003-10-30  6:04     ` Todd Lyons
@ 2003-10-30  7:33       ` Andreas Dilger
  0 siblings, 0 replies; 9+ messages in thread
From: Andreas Dilger @ 2003-10-30  7:33 UTC (permalink / raw)
  To: Todd Lyons; +Cc: Hans Reiser, reiserfs-list

On Oct 29, 2003  22:04 -0800, Todd Lyons wrote:
> Hans Reiser wrote:
> > Is ext3 using htree?
> 
> You know, I don't know the answer to that.  I would guess that it's the
> default compilation options, so if you say that it uses that by default,
> I'll believe you.  I tried grepping the binary but didn't find anything
> that indicated type of structure.

Unlikely.  The htree (also called indexed directory) code is not in any
common kernel, although we have been using it extensively for Lustre for
many months without problems.

In the past we did some comparisons between ext3+htree and reiserfs for
creating 10M files in a single directory and for local filesystems both
started at around 25k-30k/s but ext3 started to taper off after about
1M files to under 10k/s (this was on a very large/fast SCSI RAID array)
while reiserfs kept pretty steady at 25k/s for the whole 10M files.

Ext3 was a lot worse initially, but I wrote a patch which avoided reading
empty inode table blocks from disk while it was allocating inodes.  That
patch made it into 2.6 I think, but isn't in stock 2.4.  What also makes
a big difference is the size of your journal, because you can quickly
dirty a lot of blocks as you insert filenames into the directory when
there are a lot of directory blocks.

Cheers, Andreas
--
Andreas Dilger
http://sourceforge.net/projects/ext2resize/
http://www-mddsp.enel.ucalgary.ca/People/adilger/


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2003-10-30  7:33 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-10-28 14:50 ReiserFS v3 + millions of files? Dan Oglesby
  -- strict thread matches above, loose matches on Subject: below --
2003-10-27  3:46 Dan Oglesby
2003-10-27  6:55 ` Hans Reiser
2003-10-27  9:38 ` Hans Reiser
2003-10-27 11:42 ` Nikita Danilov
2003-10-29  6:03 ` Todd Lyons
2003-10-29  8:44   ` Hans Reiser
2003-10-30  6:04     ` Todd Lyons
2003-10-30  7:33       ` Andreas Dilger

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.