linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Tao Ma <tm@tao.ma>
To: Andreas Dilger <adilger@dilger.ca>
Cc: Ted Ts'o <tytso@mit.edu>, ext4 development <linux-ext4@vger.kernel.org>
Subject: Re: [RFC] Add inline data support in ext4
Date: Wed, 28 Sep 2011 17:06:12 +0800	[thread overview]
Message-ID: <4E82E384.80405@tao.ma> (raw)
In-Reply-To: <4E1AC23D-3CC8-4C8C-BA54-F2AB9958D13A@dilger.ca>

Hi Andreas,
	Thanks for the feedback.
On 09/28/2011 03:34 AM, Andreas Dilger wrote:
> On 2011-09-27, at 1:11 AM, Tao Ma wrote:
>> Hi Ted, Andreas and the list,
>> 	As you may already know, we are beginning to evaluate the
>> bigalloc features in our production system. The performance looks
>> promising, but we have also met with a severe problem with bigalloc.
>>
>> As ext4 now allocates one block for the directory even if it is empty,
>> it is really space-consuming for some applications which uses hashes
>> and create large numbers of directories(AUFS in squid for example).
>>
>> ocfs2 now uses inline data for a new created file/dir so that some
>> small ones can have their data within the inodes. It is really helpful
>> and we are considering adding the same to ext4.
>>
>> What is your option? I haven't been involved in ext4 for a long time,
>> so I am not sure whether there was a similar try which was abandoned
>> finally. Anyway, with bigalloc added, it is really needed for us to
>> support inline data now.
> 
> At one time we discussed storing file tails in xattrs to allow small
> files stored inside the inode itself.  There is already an EXT2_TAIL_FL
> that was used on reiserfs that could be reused for ext4, though it
> would need a new INCOMPAT feature flag.  This idea could be expanded
> to sharing a single bigalloc chunk as an xattr block between multiple
> files, and each one storing their file/dir data in a "system.data"
> xattr (or something similar).
> 
> For small directories, the "." and ".." entries could even be stored
> inside the inode in this "system.data" xattr, since they are only 24
> bytes in size and there are ~100 bytes of xattr space in a 256-byte
> inode.  By making all "small data" (smaller than, say 1/2 of a chunk)
> an xattr, the xattr code can use the most efficient location for the
> storage, either inside the inode, or in a shared block.
> 
> I read once that there are many directories with only one or two
> files in them, and 100 bytes could hold 3 or 4 dirents, or more
> for larger inodes.  This would probably be an improvement even for
> non-bigalloc filesystems, since small directories could be handled
> without seeks, as could very small files.
> 
> A quick check of my home directory shows mostly small subdirectories:
> 
> dirs=44859 files=677028 filename_chars=12909288 mean_chars=19
> dirs: zero_dirent=1609 one_dirent=12937 two_dirent=2456 mean_dirent=17
> 
> so more 37% of directories have 2 or fewer files/subdirs, and the
> average size of a directory is ((19 + 3 + 8) * 17) = 510 bytes.
> The +3 is for rounding the name up to a multiple of 4, and +8 is
> for the inode, length, and type fields in the dirent.  The same looks
> to be true for /usr as well.
> 
> So, in this case, close to half of directories could be held entirely
> within the system.data xattr inside a 512-byte inode.
yeah, actually my home are similar like yours. So I guess others have
similar ones which makes in-inode data very beneficial. And actually
ocfs2 works like what you describes above. It shares the spaces after
the field of an inode with xattr and it works well. So I will try to
generate some rough codes to test how it works.

Thanks
Tao

      reply	other threads:[~2011-09-28  9:06 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-09-27  7:11 [RFC] Add inline data support in ext4 Tao Ma
2011-09-27 19:34 ` Andreas Dilger
2011-09-28  9:06   ` Tao Ma [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4E82E384.80405@tao.ma \
    --to=tm@tao.ma \
    --cc=adilger@dilger.ca \
    --cc=linux-ext4@vger.kernel.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).