* [RFC] Add inline data support in ext4 @ 2011-09-27 7:11 Tao Ma 2011-09-27 19:34 ` Andreas Dilger 0 siblings, 1 reply; 3+ messages in thread From: Tao Ma @ 2011-09-27 7:11 UTC (permalink / raw) To: Ted Ts'o, ext4 development, Andreas Dilger Hi Ted, Andreas and the list, As you may already know, we are beginning to evaluate the bigalloc features in our production system. The performance looks promising, but we have also met with a severe problem with bigalloc. As ext4 now allocates one block for the directory even if it is empty, it is really space-consuming for some applications which uses hashes and create large numbers of directories(AUFS in squid for example). ocfs2 now uses inline data for a new created file/dir so that some small ones can have their data within the inodes. It is really helpful and we are considering adding the same to ext4. What is your option? I haven't been involved in ext4 for a long time, so I am not sure whether there was a similar try which was abandoned finally. Anyway, with bigalloc added, it is really needed for us to support inline data now. Thanks Tao ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [RFC] Add inline data support in ext4 2011-09-27 7:11 [RFC] Add inline data support in ext4 Tao Ma @ 2011-09-27 19:34 ` Andreas Dilger 2011-09-28 9:06 ` Tao Ma 0 siblings, 1 reply; 3+ messages in thread From: Andreas Dilger @ 2011-09-27 19:34 UTC (permalink / raw) To: Tao Ma; +Cc: Ted Ts'o, ext4 development On 2011-09-27, at 1:11 AM, Tao Ma wrote: > Hi Ted, Andreas and the list, > As you may already know, we are beginning to evaluate the > bigalloc features in our production system. The performance looks > promising, but we have also met with a severe problem with bigalloc. > > As ext4 now allocates one block for the directory even if it is empty, > it is really space-consuming for some applications which uses hashes > and create large numbers of directories(AUFS in squid for example). > > ocfs2 now uses inline data for a new created file/dir so that some > small ones can have their data within the inodes. It is really helpful > and we are considering adding the same to ext4. > > What is your option? I haven't been involved in ext4 for a long time, > so I am not sure whether there was a similar try which was abandoned > finally. Anyway, with bigalloc added, it is really needed for us to > support inline data now. At one time we discussed storing file tails in xattrs to allow small files stored inside the inode itself. There is already an EXT2_TAIL_FL that was used on reiserfs that could be reused for ext4, though it would need a new INCOMPAT feature flag. This idea could be expanded to sharing a single bigalloc chunk as an xattr block between multiple files, and each one storing their file/dir data in a "system.data" xattr (or something similar). For small directories, the "." and ".." entries could even be stored inside the inode in this "system.data" xattr, since they are only 24 bytes in size and there are ~100 bytes of xattr space in a 256-byte inode. By making all "small data" (smaller than, say 1/2 of a chunk) an xattr, the xattr code can use the most efficient location for the storage, either inside the inode, or in a shared block. I read once that there are many directories with only one or two files in them, and 100 bytes could hold 3 or 4 dirents, or more for larger inodes. This would probably be an improvement even for non-bigalloc filesystems, since small directories could be handled without seeks, as could very small files. A quick check of my home directory shows mostly small subdirectories: dirs=44859 files=677028 filename_chars=12909288 mean_chars=19 dirs: zero_dirent=1609 one_dirent=12937 two_dirent=2456 mean_dirent=17 so more 37% of directories have 2 or fewer files/subdirs, and the average size of a directory is ((19 + 3 + 8) * 17) = 510 bytes. The +3 is for rounding the name up to a multiple of 4, and +8 is for the inode, length, and type fields in the dirent. The same looks to be true for /usr as well. So, in this case, close to half of directories could be held entirely within the system.data xattr inside a 512-byte inode. Cheers, Andreas ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [RFC] Add inline data support in ext4 2011-09-27 19:34 ` Andreas Dilger @ 2011-09-28 9:06 ` Tao Ma 0 siblings, 0 replies; 3+ messages in thread From: Tao Ma @ 2011-09-28 9:06 UTC (permalink / raw) To: Andreas Dilger; +Cc: Ted Ts'o, ext4 development Hi Andreas, Thanks for the feedback. On 09/28/2011 03:34 AM, Andreas Dilger wrote: > On 2011-09-27, at 1:11 AM, Tao Ma wrote: >> Hi Ted, Andreas and the list, >> As you may already know, we are beginning to evaluate the >> bigalloc features in our production system. The performance looks >> promising, but we have also met with a severe problem with bigalloc. >> >> As ext4 now allocates one block for the directory even if it is empty, >> it is really space-consuming for some applications which uses hashes >> and create large numbers of directories(AUFS in squid for example). >> >> ocfs2 now uses inline data for a new created file/dir so that some >> small ones can have their data within the inodes. It is really helpful >> and we are considering adding the same to ext4. >> >> What is your option? I haven't been involved in ext4 for a long time, >> so I am not sure whether there was a similar try which was abandoned >> finally. Anyway, with bigalloc added, it is really needed for us to >> support inline data now. > > At one time we discussed storing file tails in xattrs to allow small > files stored inside the inode itself. There is already an EXT2_TAIL_FL > that was used on reiserfs that could be reused for ext4, though it > would need a new INCOMPAT feature flag. This idea could be expanded > to sharing a single bigalloc chunk as an xattr block between multiple > files, and each one storing their file/dir data in a "system.data" > xattr (or something similar). > > For small directories, the "." and ".." entries could even be stored > inside the inode in this "system.data" xattr, since they are only 24 > bytes in size and there are ~100 bytes of xattr space in a 256-byte > inode. By making all "small data" (smaller than, say 1/2 of a chunk) > an xattr, the xattr code can use the most efficient location for the > storage, either inside the inode, or in a shared block. > > I read once that there are many directories with only one or two > files in them, and 100 bytes could hold 3 or 4 dirents, or more > for larger inodes. This would probably be an improvement even for > non-bigalloc filesystems, since small directories could be handled > without seeks, as could very small files. > > A quick check of my home directory shows mostly small subdirectories: > > dirs=44859 files=677028 filename_chars=12909288 mean_chars=19 > dirs: zero_dirent=1609 one_dirent=12937 two_dirent=2456 mean_dirent=17 > > so more 37% of directories have 2 or fewer files/subdirs, and the > average size of a directory is ((19 + 3 + 8) * 17) = 510 bytes. > The +3 is for rounding the name up to a multiple of 4, and +8 is > for the inode, length, and type fields in the dirent. The same looks > to be true for /usr as well. > > So, in this case, close to half of directories could be held entirely > within the system.data xattr inside a 512-byte inode. yeah, actually my home are similar like yours. So I guess others have similar ones which makes in-inode data very beneficial. And actually ocfs2 works like what you describes above. It shares the spaces after the field of an inode with xattr and it works well. So I will try to generate some rough codes to test how it works. Thanks Tao ^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2011-09-28 9:06 UTC | newest] Thread overview: 3+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-09-27 7:11 [RFC] Add inline data support in ext4 Tao Ma 2011-09-27 19:34 ` Andreas Dilger 2011-09-28 9:06 ` Tao Ma
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).