* Maximum files per Directory
@ 2001-05-01 20:48 Andreas Rogge
2001-05-01 20:58 ` H. Peter Anvin
` (2 more replies)
0 siblings, 3 replies; 14+ messages in thread
From: Andreas Rogge @ 2001-05-01 20:48 UTC (permalink / raw)
To: linux-kernel
While trying to create 100.000 (in words: one hundred thousand) Mailboxes
with
cyrus-imapd i ran into problems.
At about 2^15 files the filesystem gave up, telling me that there cannot be
more files in a directory.
Is this a vfs-Issue or an ext2-issue?
Andreas Rogge <lu01@rogge.yi.org>
Available on IRCnet:#linux.de as Dyson
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Maximum files per Directory
2001-05-01 20:48 Maximum files per Directory Andreas Rogge
@ 2001-05-01 20:58 ` H. Peter Anvin
2001-05-01 22:57 ` Andreas Dilger
2001-05-01 21:02 ` Alan Cox
2001-05-02 9:21 ` Henning P. Schmiedehausen
2 siblings, 1 reply; 14+ messages in thread
From: H. Peter Anvin @ 2001-05-01 20:58 UTC (permalink / raw)
To: linux-kernel
Followup to: <272800000.988750082@hades>
By author: Andreas Rogge <lu01@rogge.yi.org>
In newsgroup: linux.dev.kernel
>
> While trying to create 100.000 (in words: one hundred thousand) Mailboxes
> with
> cyrus-imapd i ran into problems.
> At about 2^15 files the filesystem gave up, telling me that there cannot be
> more files in a directory.
>
> Is this a vfs-Issue or an ext2-issue?
>
Not correct, there can't be more than 2^15 *directories* in a single
directory. I belive this is an ext2 limitation.
-hpa
--
<hpa@transmeta.com> at work, <hpa@zytor.com> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Maximum files per Directory
2001-05-01 20:48 Maximum files per Directory Andreas Rogge
2001-05-01 20:58 ` H. Peter Anvin
@ 2001-05-01 21:02 ` Alan Cox
2001-05-01 22:03 ` H. Peter Anvin
2001-05-02 13:33 ` Ketil Froyn
2001-05-02 9:21 ` Henning P. Schmiedehausen
2 siblings, 2 replies; 14+ messages in thread
From: Alan Cox @ 2001-05-01 21:02 UTC (permalink / raw)
To: Andreas Rogge; +Cc: linux-kernel
> cyrus-imapd i ran into problems.
> At about 2^15 files the filesystem gave up, telling me that there cannot be
> more files in a directory.
>
> Is this a vfs-Issue or an ext2-issue?
Bit of both. You exceeded the max link count, and your performance would have
been abominable too. cyrus should be using heirarchies of directories for
very large amounts of stuff.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Maximum files per Directory
2001-05-01 21:02 ` Alan Cox
@ 2001-05-01 22:03 ` H. Peter Anvin
2001-05-02 10:22 ` Ingo Oeser
2001-05-02 13:33 ` Ketil Froyn
1 sibling, 1 reply; 14+ messages in thread
From: H. Peter Anvin @ 2001-05-01 22:03 UTC (permalink / raw)
To: linux-kernel
Followup to: <E14uhI2-0002NH-00@the-village.bc.nu>
By author: Alan Cox <alan@lxorguk.ukuu.org.uk>
In newsgroup: linux.dev.kernel
>
> > cyrus-imapd i ran into problems.
> > At about 2^15 files the filesystem gave up, telling me that there cannot be
> > more files in a directory.
> >
> > Is this a vfs-Issue or an ext2-issue?
>
> Bit of both. You exceeded the max link count, and your performance would have
> been abominable too. cyrus should be using heirarchies of directories for
> very large amounts of stuff.
>
But also showing, once again, that this particular scalability problem
really is a headache for some people.
-hpa
--
<hpa@transmeta.com> at work, <hpa@zytor.com> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Maximum files per Directory
2001-05-01 20:58 ` H. Peter Anvin
@ 2001-05-01 22:57 ` Andreas Dilger
2001-05-04 13:49 ` Chris Mason
0 siblings, 1 reply; 14+ messages in thread
From: Andreas Dilger @ 2001-05-01 22:57 UTC (permalink / raw)
To: Linux kernel development list
H. Peter Anvin writes:
> Not correct, there can't be more than 2^15 *directories* in a single
> directory. I belive this is an ext2 limitation.
This is imposed by a number of issues:
- EXT2_LINK_MAX=32000 is checked for new subdirectories
- ext2 bg_used_dirs_count is a __u16
- inode->i_nlink (__kernel_nlink_t) is an unsigned short for some platforms
For stat (old interface) the st_nlinks count is also an unsigned short, so
we _should_ be able to increase EXT2_LINK_MAX to 65500 or so safely. The
VFS will have problems if you increase the max link count over 65535 because
__kernel_nlink_t is __u16.
I see that reiserfs plays some tricks with the directory i_nlink count.
If you exceed 64536 links in a directory, it reverts to "1" and no longer
tracks the link count.
You will have problems with performance for directories this large on
stock ext2, unless you use Daniel Phillips' indexed directory patch.
I have tested 100k+ _files_ in a single directory without problems
(Daniel has tested 1M _files_ without problems). I would NOT reccommend
doing this on your production mail server at this time, but it may be
worth testing at least... It does not (yet) address the issue of lots of
subdirectories, but that is something that can be worked on at least.
http://kernelnewbies.org/~phillips/htree/
Cheers, Andreas
--
Andreas Dilger Turbolinux filesystem development
http://sourceforge.net/projects/ext2resize/
http://www-mddsp.enel.ucalgary.ca/People/adilger/
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Maximum files per Directory
2001-05-01 20:48 Maximum files per Directory Andreas Rogge
2001-05-01 20:58 ` H. Peter Anvin
2001-05-01 21:02 ` Alan Cox
@ 2001-05-02 9:21 ` Henning P. Schmiedehausen
2 siblings, 0 replies; 14+ messages in thread
From: Henning P. Schmiedehausen @ 2001-05-02 9:21 UTC (permalink / raw)
To: linux-kernel
Andreas Rogge <lu01@rogge.yi.org> writes:
>While trying to create 100.000 (in words: one hundred thousand) Mailboxes
>with
>cyrus-imapd i ran into problems.
>At about 2^15 files the filesystem gave up, telling me that there cannot be
>more files in a directory.
Ugh. Went into this on a NetApp Filer some years ago, too.
Easy solution: Use multiple partitions with cyrus.
I also have a hashing patch for cyrus somewhere.
Does ReiserFS help here?
Regards
Henning
--
Dipl.-Inf. (Univ.) Henning P. Schmiedehausen -- Geschaeftsfuehrer
INTERMETA - Gesellschaft fuer Mehrwertdienste mbH hps@intermeta.de
Am Schwabachgrund 22 Fon.: 09131 / 50654-0 info@intermeta.de
D-91054 Buckenhof Fax.: 09131 / 50654-20
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Maximum files per Directory
2001-05-01 22:03 ` H. Peter Anvin
@ 2001-05-02 10:22 ` Ingo Oeser
2001-05-02 16:13 ` H. Peter Anvin
0 siblings, 1 reply; 14+ messages in thread
From: Ingo Oeser @ 2001-05-02 10:22 UTC (permalink / raw)
To: H. Peter Anvin; +Cc: linux-kernel
On Tue, May 01, 2001 at 03:03:44PM -0700, H. Peter Anvin wrote:
> > Bit of both. You exceeded the max link count, and your
> > performance would have been abominable too. cyrus should be
> > using heirarchies of directories for very large amounts of
> > stuff.
Right.
> But also showing, once again, that this particular scalability problem
> really is a headache for some people.
If you do ls on that directory as an admin, you'll see, what the
REAL cause of this headache is:
The application doing such stupid thing!
People (writing applications) building up such large directories
should be forced to read every entry of it aloud.
Then they'll learn[1] and the problem is solved.
Regards
Ingo Oeser
[1] If not, let them repeat until they do.
--
10.+11.03.2001 - 3. Chemnitzer LinuxTag <http://www.tu-chemnitz.de/linux/tag>
<<<<<<<<<<<< been there and had much fun >>>>>>>>>>>>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Maximum files per Directory
2001-05-01 21:02 ` Alan Cox
2001-05-01 22:03 ` H. Peter Anvin
@ 2001-05-02 13:33 ` Ketil Froyn
1 sibling, 0 replies; 14+ messages in thread
From: Ketil Froyn @ 2001-05-02 13:33 UTC (permalink / raw)
To: Alan Cox; +Cc: lu01, linux-kernel
On Tue, 1 May 2001, Alan Cox wrote:
> > cyrus-imapd i ran into problems.
> > At about 2^15 files the filesystem gave up, telling me that there cannot be
> > more files in a directory.
> >
> > Is this a vfs-Issue or an ext2-issue?
>
> Bit of both. You exceeded the max link count, and your performance would have
> been abominable too. cyrus should be using heirarchies of directories for
> very large amounts of stuff.
That's not always best, is it? I've been testing a bit with reiserfs, and
with LOTS of files, I lose performance with a lot of directories compared
to putting all the files in one directory.
Of course, that is only read-performance. Write performance is enhanced
(at least when creating new files) by splitting this into some more
directories. So how you want to split this up depends whether your data is
write-many-read-once or write-once-read-many or something in between. That
is my experience with reiserfs, anyway.
Ketil
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Maximum files per Directory
2001-05-02 10:22 ` Ingo Oeser
@ 2001-05-02 16:13 ` H. Peter Anvin
0 siblings, 0 replies; 14+ messages in thread
From: H. Peter Anvin @ 2001-05-02 16:13 UTC (permalink / raw)
To: Ingo Oeser; +Cc: H. Peter Anvin, linux-kernel
Ingo Oeser wrote:
>
> On Tue, May 01, 2001 at 03:03:44PM -0700, H. Peter Anvin wrote:
> > > Bit of both. You exceeded the max link count, and your
> > > performance would have been abominable too. cyrus should be
> > > using heirarchies of directories for very large amounts of
> > > stuff.
> Right.
>
> > But also showing, once again, that this particular scalability problem
> > really is a headache for some people.
>
> If you do ls on that directory as an admin, you'll see, what the
> REAL cause of this headache is:
>
> The application doing such stupid thing!
>
> People (writing applications) building up such large directories
> should be forced to read every entry of it aloud.
>
> Then they'll learn[1] and the problem is solved.
>
"Violence is the last refuge of the incompetent."
Seriously, I don't buy this "the application is doing something stupid."
The application is using the VFS the way it is advertised to work. If
you think doing ls on an extrememly large directory is painful, you have
never seen the droppings of an application which tries to do
load-balancing between directories by doing real hashing. THAT is
painful! At least in the first case you can use grep.
The only ones we fool by repeating the mantra "stupid admin, stupid
application" is ourselves.
-hpa
--
<hpa@transmeta.com> at work, <hpa@zytor.com> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Maximum files per Directory
2001-05-01 22:57 ` Andreas Dilger
@ 2001-05-04 13:49 ` Chris Mason
2001-05-04 19:15 ` Andreas Dilger
0 siblings, 1 reply; 14+ messages in thread
From: Chris Mason @ 2001-05-04 13:49 UTC (permalink / raw)
To: Andreas Dilger, Linux kernel development list
On Tuesday, May 01, 2001 04:57:02 PM -0600 Andreas Dilger
<adilger@turbolinux.com> wrote:
> H. Peter Anvin writes:
>> Not correct, there can't be more than 2^15 *directories* in a single
>> directory. I belive this is an ext2 limitation.
>
>
> I see that reiserfs plays some tricks with the directory i_nlink count.
> If you exceed 64536 links in a directory, it reverts to "1" and no longer
> tracks the link count.
Correct. The link count isn't used at all when deciding if the directory
is empty (we use the size instead), so we can just lie to VFS if someone
tries to make tons of subdirs.
-chris
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Maximum files per Directory
2001-05-04 13:49 ` Chris Mason
@ 2001-05-04 19:15 ` Andreas Dilger
2001-05-04 20:08 ` Chris Mason
0 siblings, 1 reply; 14+ messages in thread
From: Andreas Dilger @ 2001-05-04 19:15 UTC (permalink / raw)
To: Chris Mason; +Cc: Andreas Dilger, Linux kernel development list
Chris writes:
> On Tuesday, May 01, 2001 04:57:02 PM -0600 Andreas Dilger
> <adilger@turbolinux.com> wrote:
> > I see that reiserfs plays some tricks with the directory i_nlink count.
> > If you exceed 64536 links in a directory, it reverts to "1" and no longer
> > tracks the link count.
>
> Correct. The link count isn't used at all when deciding if the directory
> is empty (we use the size instead), so we can just lie to VFS if someone
> tries to make tons of subdirs.
For that matter, ext2 doesn't use the link count on directories to determine
if they are empty either, so it shouldn't be too hard to do the same with
the ext2 indexed-directory code. Is there a reason that reiserfs chose to
have "large number of directories" represented by "1" and not "LINK_MAX+1"?
Cheers, Andreas
--
Andreas Dilger \ "If a man ate a pound of pasta and a pound of antipasto,
\ would they cancel out, leaving him still hungry?"
http://www-mddsp.enel.ucalgary.ca/People/adilger/ -- Dogbert
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Maximum files per Directory
2001-05-04 19:15 ` Andreas Dilger
@ 2001-05-04 20:08 ` Chris Mason
2001-05-05 13:49 ` Jamie Lokier
0 siblings, 1 reply; 14+ messages in thread
From: Chris Mason @ 2001-05-04 20:08 UTC (permalink / raw)
To: Andreas Dilger; +Cc: Linux kernel development list
On Friday, May 04, 2001 01:15:22 PM -0600 Andreas Dilger
<adilger@turbolinux.com> wrote:
> Chris writes:
>> On Tuesday, May 01, 2001 04:57:02 PM -0600 Andreas Dilger
>> <adilger@turbolinux.com> wrote:
>> > I see that reiserfs plays some tricks with the directory i_nlink count.
>> > If you exceed 64536 links in a directory, it reverts to "1" and no
>> > longer tracks the link count.
>>
>> Correct. The link count isn't used at all when deciding if the directory
>> is empty (we use the size instead), so we can just lie to VFS if someone
>> tries to make tons of subdirs.
>
> For that matter, ext2 doesn't use the link count on directories to
> determine if they are empty either, so it shouldn't be too hard to do the
> same with the ext2 indexed-directory code. Is there a reason that
> reiserfs chose to have "large number of directories" represented by "1"
> and not "LINK_MAX+1"?
>
find and a few others consider a link count of 1 to mean there is no link
count tracking being done.
-chris
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Maximum files per Directory
2001-05-04 20:08 ` Chris Mason
@ 2001-05-05 13:49 ` Jamie Lokier
2001-05-05 16:16 ` Chris Mason
0 siblings, 1 reply; 14+ messages in thread
From: Jamie Lokier @ 2001-05-05 13:49 UTC (permalink / raw)
To: Chris Mason; +Cc: Andreas Dilger, Linux kernel development list
Chris Mason wrote:
> > Is there a reason that
> > reiserfs chose to have "large number of directories" represented by "1"
> > and not "LINK_MAX+1"?
>
> find and a few others consider a link count of 1 to mean there is no link
> count tracking being done.
Indeed, and thank you for getting this right!
Btw, is it possible to add dirent->d_type information to reiserfs, and
would there be any performance gain in doing so?
I have code to add d_type for every other filesystem that can support it
without additional disk reads, but I couldn't figure out whether
reiserfs can do it or whether stat() following readdir() is cheap anyway.
-- Jamie (who has written a find-like program)
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Maximum files per Directory
2001-05-05 13:49 ` Jamie Lokier
@ 2001-05-05 16:16 ` Chris Mason
0 siblings, 0 replies; 14+ messages in thread
From: Chris Mason @ 2001-05-05 16:16 UTC (permalink / raw)
To: Jamie Lokier; +Cc: Andreas Dilger, Linux kernel development list
On Saturday, May 05, 2001 03:49:20 PM +0200 Jamie Lokier
<lk@tantalophile.demon.co.uk> wrote:
> Chris Mason wrote:
>> > Is there a reason that
>> > reiserfs chose to have "large number of directories" represented by "1"
>> > and not "LINK_MAX+1"?
>>
>> find and a few others consider a link count of 1 to mean there is no link
>> count tracking being done.
>
> Indeed, and thank you for getting this right!
>
> Btw, is it possible to add dirent->d_type information to reiserfs, and
> would there be any performance gain in doing so?
reiserfs doesn't store that information in its directory items right now,
but there are plenty of free bits to do so. It wouldn't be hard to add the
feature, and yes there should be a performance gain.
>
> I have code to add d_type for every other filesystem that can support it
> without additional disk reads, but I couldn't figure out whether
> reiserfs can do it or whether stat() following readdir() is cheap anyway.
stat is actually a little more expensive than ext2, since we have to search
for the inode data in the tree. It is a fast search, but...
-chris
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2001-05-05 16:18 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2001-05-01 20:48 Maximum files per Directory Andreas Rogge
2001-05-01 20:58 ` H. Peter Anvin
2001-05-01 22:57 ` Andreas Dilger
2001-05-04 13:49 ` Chris Mason
2001-05-04 19:15 ` Andreas Dilger
2001-05-04 20:08 ` Chris Mason
2001-05-05 13:49 ` Jamie Lokier
2001-05-05 16:16 ` Chris Mason
2001-05-01 21:02 ` Alan Cox
2001-05-01 22:03 ` H. Peter Anvin
2001-05-02 10:22 ` Ingo Oeser
2001-05-02 16:13 ` H. Peter Anvin
2001-05-02 13:33 ` Ketil Froyn
2001-05-02 9:21 ` Henning P. Schmiedehausen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox