From: Andreas Dilger <adilger@clusterfs.com>
To: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Cc: cmm@us.ibm.com, coly li <colyli@gmail.com>,
"Jose R. Santos" <jrs@us.ibm.com>,
"linux-ext4@vger.kernel.org" <linux-ext4@vger.kernel.org>
Subject: Re: block groups with no inode tables
Date: Tue, 10 Jul 2007 22:50:29 -0600 [thread overview]
Message-ID: <20070711045029.GK6417@schatzie.adilger.int> (raw)
In-Reply-To: <1184094545.13379.20.camel@kleikamp.austin.ibm.com>
On Jul 10, 2007 14:09 -0500, Dave Kleikamp wrote:
> Assuming you mean the parent directory? An inode isn't tied to a
> specific parent.
>
> ln dir1/file1 dir2/
> mv dir1/file1 dir3/
> rmdir dir1
>
> What is happens to the inode?
The inode stays in the same place, and the block map of the directories
are changed to enclose the inode. In ideal (== normal) circumstances,
inodes are allocated within a directory in a sequential manner, and this
would also result in linear inode block allocation, great for extent-mapped
files. In cases like the above, you will have fragmented IO patterns,
but those are already true for existing directories.
> I really don't think that the directory is the right place to store an inode.
There are actually some performance benefits from this, see
http://citeseer.ist.psu.edu/ganger97embedded.html
Each inode would be a disk block, or possibly a few (slightly larger than
now) inodes per block, on the order of 1kB or more. This allows for
packing small files into the inode also (as an EA) or alternately having
many extents in the inode for huge files or lots of inline EAs.
I've also got a plan to overcome the hard-link limitations in that paper,
by storing the filename of an inode as an EA in the inode, prefixed by
the inode number & generation of the parent. When doing a readdir or
lookup, we know the parent directory in which we are looking, so we can
only consider names in that directory. When doing a readdir, we can
immediately list all of the names for this inode together. The caveat
is that we need a flexible EA scheme to handle this, maybe a directory
with more EAs in it?
The one thing that I'm not sure about is how to handle the case where
inode blocks are allocated in relatively random order. I'd _like_ to
be able to avoid the POSIX telldir/seekdir problem by doing readdir()
in block order, but that also means that if we allocate an inode block
between two other existing inode blocks in a directory that we should
"insert" the block into the directory instead of e.g. appending it.
That means the file offset in a directory is not constant, but maybe it
is OK to return the physical block number for telldir?
We would still have a hash for the files, but instead of per block
as it is now, it would need to have leaf entries for each name, since
an inode can have many names and would appear in multiple hash buckets.
Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.
next prev parent reply other threads:[~2007-07-11 4:50 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-07-10 17:12 block groups with no inode tables Jose R. Santos
2007-07-10 17:30 ` coly li
2007-07-10 17:40 ` Dave Kleikamp
2007-07-10 15:59 ` Mingming Cao
2007-07-10 19:09 ` Dave Kleikamp
2007-07-11 4:50 ` Andreas Dilger [this message]
2007-07-10 20:30 ` Theodore Tso
2007-07-11 4:31 ` Andreas Dilger
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20070711045029.GK6417@schatzie.adilger.int \
--to=adilger@clusterfs.com \
--cc=cmm@us.ibm.com \
--cc=colyli@gmail.com \
--cc=jrs@us.ibm.com \
--cc=linux-ext4@vger.kernel.org \
--cc=shaggy@linux.vnet.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).