* Default ext inode size
@ 2008-11-13 19:56 Phillip Susi
2008-11-13 20:14 ` Kalpak Shah
2008-11-13 20:21 ` Theodore Tso
0 siblings, 2 replies; 5+ messages in thread
From: Phillip Susi @ 2008-11-13 19:56 UTC (permalink / raw)
To: linux-fsdevel
I noticed that the default inode size for mkfs in e2fsprogs has been
changed to 256 bytes. I noticed this because I am seeing users complain
that they can no longer access their ext partitions using the windows
driver, which only supports normal 128 byte inodes. I'd like to know
why this default was changed.
As I understand it, the larger inode size means that ea/acl can be
stored directly in the inode. Are there any other benefits? It seems
that using extended attributes is rather uncommon in the first place,
and that when they are used, many files often share them so it would be
better to leave them in the shared data block rather than duplicate them
in every inode. This leaves me wondering where is the common case that
benefits from a larger default inode?
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Default ext inode size
2008-11-13 19:56 Default ext inode size Phillip Susi
@ 2008-11-13 20:14 ` Kalpak Shah
2008-11-13 20:21 ` Theodore Tso
1 sibling, 0 replies; 5+ messages in thread
From: Kalpak Shah @ 2008-11-13 20:14 UTC (permalink / raw)
To: Phillip Susi; +Cc: linux-fsdevel
On Fri, Nov 14, 2008 at 1:26 AM, Phillip Susi <psusi@cfl.rr.com> wrote:
> I noticed that the default inode size for mkfs in e2fsprogs has been changed
> to 256 bytes. I noticed this because I am seeing users complain that they
> can no longer access their ext partitions using the windows driver, which
> only supports normal 128 byte inodes. I'd like to know why this default was
> changed.
>
> As I understand it, the larger inode size means that ea/acl can be stored
> directly in the inode. Are there any other benefits? It seems that using
> extended attributes is rather uncommon in the first place, and that when
> they are used, many files often share them so it would be better to leave
> them in the shared data block rather than duplicate them in every inode.
> This leaves me wondering where is the common case that benefits from a
> larger default inode?
The larger inode is also needed to support new features like
nanosecond timestamps, creation time, 64-bit inode versions.
If you want to override the 256-byte inode default, you can use "-I
128" while formatting your filesystems with mke2fs.
Thanks,
Kalpak
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Default ext inode size
2008-11-13 19:56 Default ext inode size Phillip Susi
2008-11-13 20:14 ` Kalpak Shah
@ 2008-11-13 20:21 ` Theodore Tso
2008-11-13 20:50 ` Phillip Susi
1 sibling, 1 reply; 5+ messages in thread
From: Theodore Tso @ 2008-11-13 20:21 UTC (permalink / raw)
To: Phillip Susi; +Cc: linux-fsdevel
On Thu, Nov 13, 2008 at 02:56:38PM -0500, Phillip Susi wrote:
> I noticed that the default inode size for mkfs in e2fsprogs has been
> changed to 256 bytes. I noticed this because I am seeing users complain
> that they can no longer access their ext partitions using the windows
> driver, which only supports normal 128 byte inodes. I'd like to know
> why this default was changed.
>
> As I understand it, the larger inode size means that ea/acl can be
> stored directly in the inode. Are there any other benefits?
That's the main one. The other benefit is that ext4 uses a bigger
inode to store some extra fields such as the file creation time,
nanosecond timestamps, and the 64-bit version number neede which is
used for NFSv4's client-side caching.
> It seems
> that using extended attributes is rather uncommon in the first place,
The big user of extended attribute is SELinux, Samba, and Beagle.
Since a number of distributions are now starting to enable SELinux by
default (for better or for worse), it makes a big difference from a
performance perspective for those distributions.
I can't imagine that it would be that hard to fix the Windows driver
to be able to support 258 byte inodes. It should be a one- or
two-line fix, for those people who care.
- Ted
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Default ext inode size
2008-11-13 20:21 ` Theodore Tso
@ 2008-11-13 20:50 ` Phillip Susi
2008-11-13 21:35 ` Theodore Tso
0 siblings, 1 reply; 5+ messages in thread
From: Phillip Susi @ 2008-11-13 20:50 UTC (permalink / raw)
To: Theodore Tso; +Cc: linux-fsdevel
Theodore Tso wrote:
> That's the main one. The other benefit is that ext4 uses a bigger
> inode to store some extra fields such as the file creation time,
> nanosecond timestamps, and the 64-bit version number neede which is
> used for NFSv4's client-side caching.
What about ext3? Does it do the same thing with the larger inode? Does
it get more block pointers or room for block extent lists ( in ext4 )?
Are the higher resolution timestamps used by default if the inode is
large, or is there a compatibility bit and/or mount option that needs set?
> The big user of extended attribute is SELinux, Samba, and Beagle.
> Since a number of distributions are now starting to enable SELinux by
> default (for better or for worse), it makes a big difference from a
> performance perspective for those distributions.
Does it actually help performance to store the ae in the inode? I would
think it would not make much difference if many files have the same
attributes, then the shared ea block would be cached. Storing the ea in
the inode seems like it duplicates a lot of data and means a given
amount of ram could only cache half as many inodes as with the normal
size, which would lead to less cache hits and more disk IO.
> I can't imagine that it would be that hard to fix the Windows driver
> to be able to support 258 byte inodes. It should be a one- or
> two-line fix, for those people who care.
Probably, but it is an example ( and there probably are others ) of
problems caused by changing the default, so I'm trying to understand why
ext3 was disturbed in this way rather than just make 256 byte inodes the
default for only ext4.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Default ext inode size
2008-11-13 20:50 ` Phillip Susi
@ 2008-11-13 21:35 ` Theodore Tso
0 siblings, 0 replies; 5+ messages in thread
From: Theodore Tso @ 2008-11-13 21:35 UTC (permalink / raw)
To: Phillip Susi; +Cc: linux-fsdevel
On Thu, Nov 13, 2008 at 03:50:08PM -0500, Phillip Susi wrote:
> Theodore Tso wrote:
>> That's the main one. The other benefit is that ext4 uses a bigger
>> inode to store some extra fields such as the file creation time,
>> nanosecond timestamps, and the 64-bit version number neede which is
>> used for NFSv4's client-side caching.
>
> What about ext3? Does it do the same thing with the larger inode? Does
> it get more block pointers or room for block extent lists ( in ext4 )?
> Are the higher resolution timestamps used by default if the inode is
> large, or is there a compatibility bit and/or mount option that needs
> set?
No, ext3 doesn't have any of these features; only ext4. It would be
possible to backport things like the high res timestamps to ext3, but
no one has done it to date.
We don't actually use extra block extent lists in ext4 with the extra
space. We could, but it's not clear how much it is necessary. On my
laptop, which has been running ext4 since July, a recent check on my
system showed that I had 1,058,309 inodes used, and of those roughly
one million inodes used, 548 inodes had a extent depth of one, and
exactly 2 inodes had an extent depth of two. All of the other inodes
had 3 or fewer extents, so they all fit inside the current inode's
direct block array. In fact, all but 10,876 inodes were contiguous
(i.e., only needed one extent).
Granted, my laptop is used as a development machine, which means the
bulk of the files are Maildir directories, git repositories, and build
trees, which might not be representative of say, server workloads.
Still, statistics of 98.97% of the files being completely contiguous,
and 99.95% of the files being able to store all of their extents
inside the inode table and not needing to spill to an external extent
tree are pretty impressive, and a good reason for folks to migrate to
ext4 when they have a chance. :-)
> Does it actually help performance to store the ae in the inode? I would
> think it would not make much difference if many files have the same
> attributes, then the shared ea block would be cached.
I don't use SE Linux, so I can't speak to this from personal
experience. People who have enabled it have told me the performance
difference is "dramatic".
It also seemed that with the advent of desktop programs like Beagle,
there were more desktop applications using file-unique extended
attributes, and that my personal battle to tell application programs
that "friends don't let friends use extended attributes" was a losing
fight, and given that the GNOME and KDE application programs have
filesystems engineers vastly outnumbered, it was better to assume that
we weren't going to win this one, as they pick up bad programming
habits from platforms such as Mac OS X. :-)
> Probably, but it is an example ( and there probably are others ) of
> problems caused by changing the default, so I'm trying to understand why
> ext3 was disturbed in this way rather than just make 256 byte inodes the
> default for only ext4.
We didn't "disturb" ext3; we just changed the default, to reflect the
changing usage of filesystems. If a system administrator is convinced
that thye know better, they can always adjust /etc/mke2fs.conf, and
change the default to something else.
I know of only one other problem that was turned up when we changed
the default, which was that some boot loaders didn't know how to deal
with 256 byte inodes. But that got fixed pretty fast.
Realistically, any userspace program that uses libext2fs would have
been fine, since it always did the right thing with larger inode
sizes. It was only a programs that didn't use the standard ext2
library that would get bitten, and there are very few of those around.
- Ted
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2008-11-13 21:35 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-11-13 19:56 Default ext inode size Phillip Susi
2008-11-13 20:14 ` Kalpak Shah
2008-11-13 20:21 ` Theodore Tso
2008-11-13 20:50 ` Phillip Susi
2008-11-13 21:35 ` Theodore Tso
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).