* [RFC] Add inline data support in ext4
@ 2011-09-27 7:11 Tao Ma
2011-09-27 19:34 ` Andreas Dilger
0 siblings, 1 reply; 3+ messages in thread
From: Tao Ma @ 2011-09-27 7:11 UTC (permalink / raw)
To: Ted Ts'o, ext4 development, Andreas Dilger
Hi Ted, Andreas and the list,
As you may already know, we are beginning to evaluate the bigalloc
features in our production system. The performance looks promising, but
we have also met with a severe problem with bigalloc.
As ext4 now allocates one block for the directory even if it is empty,
it is really space-consuming for some applications which uses hashes and
create large numbers of directories(AUFS in squid for example).
ocfs2 now uses inline data for a new created file/dir so that some small
ones can have their data within the inodes. It is really helpful and we
are considering adding the same to ext4.
What is your option? I haven't been involved in ext4 for a long time, so
I am not sure whether there was a similar try which was abandoned
finally. Anyway, with bigalloc added, it is really needed for us to
support inline data now.
Thanks
Tao
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [RFC] Add inline data support in ext4
2011-09-27 7:11 [RFC] Add inline data support in ext4 Tao Ma
@ 2011-09-27 19:34 ` Andreas Dilger
2011-09-28 9:06 ` Tao Ma
0 siblings, 1 reply; 3+ messages in thread
From: Andreas Dilger @ 2011-09-27 19:34 UTC (permalink / raw)
To: Tao Ma; +Cc: Ted Ts'o, ext4 development
On 2011-09-27, at 1:11 AM, Tao Ma wrote:
> Hi Ted, Andreas and the list,
> As you may already know, we are beginning to evaluate the
> bigalloc features in our production system. The performance looks
> promising, but we have also met with a severe problem with bigalloc.
>
> As ext4 now allocates one block for the directory even if it is empty,
> it is really space-consuming for some applications which uses hashes
> and create large numbers of directories(AUFS in squid for example).
>
> ocfs2 now uses inline data for a new created file/dir so that some
> small ones can have their data within the inodes. It is really helpful
> and we are considering adding the same to ext4.
>
> What is your option? I haven't been involved in ext4 for a long time,
> so I am not sure whether there was a similar try which was abandoned
> finally. Anyway, with bigalloc added, it is really needed for us to
> support inline data now.
At one time we discussed storing file tails in xattrs to allow small
files stored inside the inode itself. There is already an EXT2_TAIL_FL
that was used on reiserfs that could be reused for ext4, though it
would need a new INCOMPAT feature flag. This idea could be expanded
to sharing a single bigalloc chunk as an xattr block between multiple
files, and each one storing their file/dir data in a "system.data"
xattr (or something similar).
For small directories, the "." and ".." entries could even be stored
inside the inode in this "system.data" xattr, since they are only 24
bytes in size and there are ~100 bytes of xattr space in a 256-byte
inode. By making all "small data" (smaller than, say 1/2 of a chunk)
an xattr, the xattr code can use the most efficient location for the
storage, either inside the inode, or in a shared block.
I read once that there are many directories with only one or two
files in them, and 100 bytes could hold 3 or 4 dirents, or more
for larger inodes. This would probably be an improvement even for
non-bigalloc filesystems, since small directories could be handled
without seeks, as could very small files.
A quick check of my home directory shows mostly small subdirectories:
dirs=44859 files=677028 filename_chars=12909288 mean_chars=19
dirs: zero_dirent=1609 one_dirent=12937 two_dirent=2456 mean_dirent=17
so more 37% of directories have 2 or fewer files/subdirs, and the
average size of a directory is ((19 + 3 + 8) * 17) = 510 bytes.
The +3 is for rounding the name up to a multiple of 4, and +8 is
for the inode, length, and type fields in the dirent. The same looks
to be true for /usr as well.
So, in this case, close to half of directories could be held entirely
within the system.data xattr inside a 512-byte inode.
Cheers, Andreas
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [RFC] Add inline data support in ext4
2011-09-27 19:34 ` Andreas Dilger
@ 2011-09-28 9:06 ` Tao Ma
0 siblings, 0 replies; 3+ messages in thread
From: Tao Ma @ 2011-09-28 9:06 UTC (permalink / raw)
To: Andreas Dilger; +Cc: Ted Ts'o, ext4 development
Hi Andreas,
Thanks for the feedback.
On 09/28/2011 03:34 AM, Andreas Dilger wrote:
> On 2011-09-27, at 1:11 AM, Tao Ma wrote:
>> Hi Ted, Andreas and the list,
>> As you may already know, we are beginning to evaluate the
>> bigalloc features in our production system. The performance looks
>> promising, but we have also met with a severe problem with bigalloc.
>>
>> As ext4 now allocates one block for the directory even if it is empty,
>> it is really space-consuming for some applications which uses hashes
>> and create large numbers of directories(AUFS in squid for example).
>>
>> ocfs2 now uses inline data for a new created file/dir so that some
>> small ones can have their data within the inodes. It is really helpful
>> and we are considering adding the same to ext4.
>>
>> What is your option? I haven't been involved in ext4 for a long time,
>> so I am not sure whether there was a similar try which was abandoned
>> finally. Anyway, with bigalloc added, it is really needed for us to
>> support inline data now.
>
> At one time we discussed storing file tails in xattrs to allow small
> files stored inside the inode itself. There is already an EXT2_TAIL_FL
> that was used on reiserfs that could be reused for ext4, though it
> would need a new INCOMPAT feature flag. This idea could be expanded
> to sharing a single bigalloc chunk as an xattr block between multiple
> files, and each one storing their file/dir data in a "system.data"
> xattr (or something similar).
>
> For small directories, the "." and ".." entries could even be stored
> inside the inode in this "system.data" xattr, since they are only 24
> bytes in size and there are ~100 bytes of xattr space in a 256-byte
> inode. By making all "small data" (smaller than, say 1/2 of a chunk)
> an xattr, the xattr code can use the most efficient location for the
> storage, either inside the inode, or in a shared block.
>
> I read once that there are many directories with only one or two
> files in them, and 100 bytes could hold 3 or 4 dirents, or more
> for larger inodes. This would probably be an improvement even for
> non-bigalloc filesystems, since small directories could be handled
> without seeks, as could very small files.
>
> A quick check of my home directory shows mostly small subdirectories:
>
> dirs=44859 files=677028 filename_chars=12909288 mean_chars=19
> dirs: zero_dirent=1609 one_dirent=12937 two_dirent=2456 mean_dirent=17
>
> so more 37% of directories have 2 or fewer files/subdirs, and the
> average size of a directory is ((19 + 3 + 8) * 17) = 510 bytes.
> The +3 is for rounding the name up to a multiple of 4, and +8 is
> for the inode, length, and type fields in the dirent. The same looks
> to be true for /usr as well.
>
> So, in this case, close to half of directories could be held entirely
> within the system.data xattr inside a 512-byte inode.
yeah, actually my home are similar like yours. So I guess others have
similar ones which makes in-inode data very beneficial. And actually
ocfs2 works like what you describes above. It shares the spaces after
the field of an inode with xattr and it works well. So I will try to
generate some rough codes to test how it works.
Thanks
Tao
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2011-09-28 9:06 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-09-27 7:11 [RFC] Add inline data support in ext4 Tao Ma
2011-09-27 19:34 ` Andreas Dilger
2011-09-28 9:06 ` Tao Ma
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).