From: Theodore Tso <tytso@mit.edu>
To: Andreas Dilger <adilger@sun.com>
Cc: Kalpak Shah <kalpak.shah@gmail.com>,
Kalpak Shah <Kalpak.Shah@sun.com>,
linux-ext4 <linux-ext4@vger.kernel.org>,
Mingming Cao <cmm@us.ibm.com>
Subject: Re: [PATCH 2/2] Large EAs
Date: Wed, 26 Nov 2008 19:35:55 -0500 [thread overview]
Message-ID: <20081127003555.GD14101@mit.edu> (raw)
In-Reply-To: <20081126214929.GZ3186@webber.adilger.int>
On Wed, Nov 26, 2008 at 02:49:29PM -0700, Andreas Dilger wrote:
>
> One benefit I think is that at least the orphaned EA inode can be
> cleaned up instead of lingering in the middle of the shared EA tree.
>
> Another benefit of having separate EAs is that it makes it tractable to
> modify very large EAs. Otherwise, if there are a number of large
> EAs shared in a single tree they would all have to be modified in order
> to store a larger value for an EA in the middle of the tree.
I guess I didn't make myself clear. I was *not* suggesting that we
share EA's in one inode, or in one extent tree. Instead, what I
suggested was that instead of having a pointer to an inode, if the
value of the EA is less than half the blocksize, it is stored in the
EA block. If it is between 50% and 100% of the blocksize, instead of
pointing at inode, we point to a block. If it is greater than a
blocksize, we point at a block containing an EA tree. (Which means
for a large EA the average space overhead is 6k --- 4k for the extent
block, plus 2k for the fragmentation cost).
So this scheme very much uses separate EA's, and does not pack all of
the EA's into a single tree. It is deliberately kept simple precisely
because like you I don't think it's worth it to optimize EA's. On the
other hand, running out of inodes is a big problem, and dynamic inodes
is far more complicated an issue, especially if we don't have 64-bit
inode support in the kernel and in userspace, and we need to worry
about locality issues and how dynamic inodes work with online
resizing.
The tradeoff is that my scheme doesn't burn an inode for each large
EA, but for EA's greater than a blocksize, we chew an extra block's
worth of overhead. Personally, I think it's a worthwhile tradeoff ---
- Ted
next prev parent reply other threads:[~2008-11-27 0:39 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-11-17 20:36 [PATCH 2/2] Large EAs Kalpak Shah
2008-11-26 4:41 ` Theodore Tso
2008-11-26 6:00 ` Kalpak Shah
2008-11-26 6:54 ` Theodore Tso
2008-11-26 21:49 ` Andreas Dilger
2008-11-27 0:35 ` Theodore Tso [this message]
2008-11-27 9:27 ` Andreas Dilger
2008-12-03 10:38 ` Kalpak Shah
2008-12-17 6:10 ` Kalpak Shah
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20081127003555.GD14101@mit.edu \
--to=tytso@mit.edu \
--cc=Kalpak.Shah@sun.com \
--cc=adilger@sun.com \
--cc=cmm@us.ibm.com \
--cc=kalpak.shah@gmail.com \
--cc=linux-ext4@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).