public inbox for linux-ext4@vger.kernel.org
 help / color / mirror / Atom feed
* How many files to create in one directory?
@ 2014-01-27  7:16 Masato Minda
  2014-01-27 18:02 ` Eric Sandeen
  0 siblings, 1 reply; 6+ messages in thread
From: Masato Minda @ 2014-01-27  7:16 UTC (permalink / raw)
  To: linux-ext4

Dear Ext4 Developer;

I've copied the files from VxFS to EXT4, I saw the message “Directory
index full!”. I've checked the search engine and found these mails.

  http://www.spinics.net/lists/linux-ext4/msg25058.html
  http://www.spinics.net/lists/linux-ext4/msg25069.html

Our Ext4's block size was 1024. I've changed the block size from 1024 to
4096. Our problem was solved. But I have question.

How many files to create in one directory in 4096 block size?
I think that it is about 3 million files with perfect hashing.

  16*(4096-16)/8*(4096-8)/8*3/4= 3M

Is this correct?

Thanks in advance.

--
Masato minmin Minda <minmin@jprs.co.jp>
Japan Registry Services Co., Ltd. (JPRS)

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: How many files to create in one directory?
  2014-01-27  7:16 How many files to create in one directory? Masato Minda
@ 2014-01-27 18:02 ` Eric Sandeen
  2014-01-27 19:39   ` Theodore Ts'o
  0 siblings, 1 reply; 6+ messages in thread
From: Eric Sandeen @ 2014-01-27 18:02 UTC (permalink / raw)
  To: Masato Minda, linux-ext4

On 1/27/14, 1:16 AM, Masato Minda wrote:
> Dear Ext4 Developer;
> 
> I've copied the files from VxFS to EXT4, I saw the message “Directory
> index full!”. I've checked the search engine and found these mails.
> 
>   http://www.spinics.net/lists/linux-ext4/msg25058.html
>   http://www.spinics.net/lists/linux-ext4/msg25069.html
> 
> Our Ext4's block size was 1024. I've changed the block size from 1024 to
> 4096. Our problem was solved. But I have question.
> 
> How many files to create in one directory in 4096 block size?
> I think that it is about 3 million files with perfect hashing.
> 
>   16*(4096-16)/8*(4096-8)/8*3/4= 3M
> 
> Is this correct?

It will depend on the length of the filenames.  But by my calculations,
for average 28-char filenames, it's closer to 30 million.

There are (4096-32)/8 indices per block, or 508.
There are 2 levels, so 508*508=258064 leaf blocks.
The length of each record for 28 char names would be 32 bytes.
So you can fit 4096/32 = 128 entries per leaf block.
258064 leaf blocks * 128 entries/bock is 33,032,192 entries.

I recently made a spreadsheet to calculate this.
I'm not sure if I am doing google docs sharing and protection
correctly, but this might work:

https://docs.google.com/spreadsheet/ccc?key=0AtdHTZsZ8XoYdE1IUXlDb1RXQkdPM3F4YWpfNGhMbFE&usp=sharing#gid=0

-Eric

> Thanks in advance.
> 
> --
> Masato minmin Minda <minmin@jprs.co.jp>
> Japan Registry Services Co., Ltd. (JPRS)
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: How many files to create in one directory?
  2014-01-27 18:02 ` Eric Sandeen
@ 2014-01-27 19:39   ` Theodore Ts'o
  2014-01-27 19:48     ` Eric Sandeen
  0 siblings, 1 reply; 6+ messages in thread
From: Theodore Ts'o @ 2014-01-27 19:39 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: Masato Minda, linux-ext4

> It will depend on the length of the filenames.  But by my calculations,
> for average 28-char filenames, it's closer to 30 million.

Note that there will be some very significant performance problems
well before a directory gets that big.  For example, just simply doing
a readdir + stat on all of the files in that directory (or a readdir +
unlink, etc.) will very likely result in extremely unacceptable
performance.

So if you can find some other way of avoiding allowing the file system
that big (i.e., using a real database instead of trying to use a file
system as a database, etc.), I'd strongly suggest that you consider
those alternatives.

Regards,

					- Ted

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: How many files to create in one directory?
  2014-01-27 19:39   ` Theodore Ts'o
@ 2014-01-27 19:48     ` Eric Sandeen
  2014-01-28  2:53       ` Masato Minda
  2014-01-28 21:02       ` Andreas Dilger
  0 siblings, 2 replies; 6+ messages in thread
From: Eric Sandeen @ 2014-01-27 19:48 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: Masato Minda, linux-ext4

On 1/27/14, 1:39 PM, Theodore Ts'o wrote:
>> It will depend on the length of the filenames.  But by my calculations,
>> for average 28-char filenames, it's closer to 30 million.
> 
> Note that there will be some very significant performance problems
> well before a directory gets that big.  For example, just simply doing
> a readdir + stat on all of the files in that directory (or a readdir +
> unlink, etc.) will very likely result in extremely unacceptable
> performance.

Yep, that's the max possible, not the max useable.  ;)

(Although, I'm not sure in practice what max useable looks like, TBH).

-Eric

> So if you can find some other way of avoiding allowing the file system
> that big (i.e., using a real database instead of trying to use a file
> system as a database, etc.), I'd strongly suggest that you consider
> those alternatives.
> 
> Regards,
> 
> 					- Ted
> 


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: How many files to create in one directory?
  2014-01-27 19:48     ` Eric Sandeen
@ 2014-01-28  2:53       ` Masato Minda
  2014-01-28 21:02       ` Andreas Dilger
  1 sibling, 0 replies; 6+ messages in thread
From: Masato Minda @ 2014-01-28  2:53 UTC (permalink / raw)
  To: Eric Sandeen, Theodore Ts'o; +Cc: linux-ext4

Eric-san, Ted-san;

Thank you very much. I am happy now.

On 2014/01/28 3:02, Eric Sandeen wrote:
>
> It will depend on the length of the filenames.  But by my calculations,
> for average 28-char filenames, it's closer to 30 million.
> 
> There are (4096-32)/8 indices per block, or 508.
> There are 2 levels, so 508*508=258064 leaf blocks.
> The length of each record for 28 char names would be 32 bytes.
> So you can fit 4096/32 = 128 entries per leaf block.
> 258064 leaf blocks * 128 entries/bock is 33,032,192 entries.

I understand.

> I recently made a spreadsheet to calculate this.
> I'm not sure if I am doing google docs sharing and protection
> correctly, but this might work:
> 
> https://docs.google.com/spreadsheet/ccc?key=0AtdHTZsZ8XoYdE1IUXlDb1RXQkdPM3F4YWpfNGhMbFE&usp=sharing#gid=0

Great! It is useful for us.

On 2014/01/28 4:39, Theodore Ts'o wrote:
>
> Note that there will be some very significant performance problems
> well before a directory gets that big.  For example, just simply doing
> a readdir + stat on all of the files in that directory (or a readdir +
> unlink, etc.) will very likely result in extremely unacceptable
> performance.

Of course, I know that issue. But we have already this directory.

	$ \ls -f | wc
	1933497 1933497 14968002

This is for mail archive. :-(

On 2014/01/28 4:48, Eric Sandeen wrote:
> 
> Yep, that's the max possible, not the max useable.  ;)

Yes, I wanted to know the limitation.

Again, Thank you very much.

Best Regards,

--
Masato minmin Minda <minmin@jprs.co.jp>
Japan Registry Services Co., Ltd. (JPRS)


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: How many files to create in one directory?
  2014-01-27 19:48     ` Eric Sandeen
  2014-01-28  2:53       ` Masato Minda
@ 2014-01-28 21:02       ` Andreas Dilger
  1 sibling, 0 replies; 6+ messages in thread
From: Andreas Dilger @ 2014-01-28 21:02 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: Theodore Ts'o, Masato Minda, Ext4 Developers List

[-- Attachment #1: Type: text/plain, Size: 2802 bytes --]

On Jan 27, 2014, at 12:48 PM, Eric Sandeen <sandeen@redhat.com> wrote:
> On 1/27/14, 1:39 PM, Theodore Ts'o wrote:
>>> It will depend on the length of the filenames.  But by my calculations,
>>> for average 28-char filenames, it's closer to 30 million.

Note that there is also a 2GB directory size limit imposed by not using
i_size_high for directories.  That works out to be about:

  (2^30 bytes / 4096 bytes/block) *
  ((4096 bytes/block / (28 + 4 + 4 bytes/entry)) * 0.75 full) ~= 22M entries

We have a patch that allows using i_size_high for directories, and
adding 3rd level htree support for small block filesystems or very
large directories.  However, we haven't written e2fsck support for
it and it isn't currently enabled.

If someone is interested in taking a look at this:
http://git.whamcloud.com/?p=fs/lustre-release.git;a=blob;f=ldiskfs/kernel_patches/patches/sles11sp2/ext4-pdirop.patch;h=4d2acffadaa31a1bdd9f3a592cda71dfcdd585a4;hb=HEAD

The "htree lock" part of the patch is for allowing parallel
create/lookup/unlink access to the large directory, but last time
I asked Al Viro about this he didn't seem interested in exporting
that functionality to the VFS.

>> Note that there will be some very significant performance problems
>> well before a directory gets that big.  For example, just simply doing
>> a readdir + stat on all of the files in that directory (or a readdir +
>> unlink, etc.) will very likely result in extremely unacceptable
>> performance.
> 
> Yep, that's the max possible, not the max useable.  ;)

In newer kernels it is also possible to put an upper limit on the size
of a directory via /sys/fs/ext4/{dev}/max_dir_size_kb tunable or mount
option.  This prevents users from creating directories that are so big
they can't be handled by normal tools.

> (Although, I'm not sure in practice what max useable looks like, TBH).

We regularly test with 10M files per directory.  Obviously, workloads
that do this do not use "ls -l" or equivalent, but just lookup-by-name
from within applications.  It is usable in our testing up to about 15M
entries before there can start being problems with level-2 leaf blocks
getting full (due to uneven usage of the leaf blocks).

Cheers, Andreas

>> So if you can find some other way of avoiding allowing the file system
>> that big (i.e., using a real database instead of trying to use a file
>> system as a database, etc.), I'd strongly suggest that you consider
>> those alternatives.
>> 
>> Regards,
>> 
>> 					- Ted
>> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


Cheers, Andreas






[-- Attachment #2: Message signed with OpenPGP using GPGMail --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2014-01-28 21:02 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-01-27  7:16 How many files to create in one directory? Masato Minda
2014-01-27 18:02 ` Eric Sandeen
2014-01-27 19:39   ` Theodore Ts'o
2014-01-27 19:48     ` Eric Sandeen
2014-01-28  2:53       ` Masato Minda
2014-01-28 21:02       ` Andreas Dilger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox