From: Andreas Dilger <adilger@clusterfs.com>
To: Valerie Henson <val_henson@linux.intel.com>
Cc: Andrew Morton <akpm@osdl.org>, Jeff Garzik <jeff@garzik.org>,
Matthew Wilcox <matthew@wil.cx>,
Arjan van de Ven <arjan@linux.intel.com>,
ext2-devel <ext2-devel@lists.sourceforge.net>,
linux-kernel@vger.kernel.org, Linus Torvalds <torvalds@osdl.org>,
cmm@us.ibm.com, linux-fsdevel@vger.kernel.org,
Alex Tomas <alex@clusterfs.com>
Subject: Re: Continuation Inodes Explained! (was Re: [RFC 0/13] extents and 48bit ext3)
Date: Fri, 9 Jun 2006 23:25:02 -0600 [thread overview]
Message-ID: <20060610052502.GY5964@schatzie.adilger.int> (raw)
In-Reply-To: <20060610032623.GG10524@goober>
On Jun 09, 2006 20:26 -0700, Valerie Henson wrote:
> To be honest, continuation inodes and these ext3 patches are
> addressing different problems. ext3 48-bit extents are an advanced
> solution to a complex problem - growing ext3 beyond 8TB while keeping
> as much of the existing on-disk format and associated stable code as
> possible.
The 48-bit support was acutally only a small of the originalreason for
extents, while it seems to be the most popular right now. The other
issues that are being addressed are:
- performance issues like avoiding 0.1%+ indirect block metadata overhead
for each file which is bad for the cache, and also hurts unlinks)
- the extent index blocks are also more robust than indirect blocks (they
have a magic and internally verifiable structure, and the possibility
to easily add metadata checksums and extent->inode backpointers to
allow improved filesystem checking). With large ext3 filesystems the
{d,t,}indirect blocks can have random garbage in them and there is no
way for the kernel to know unless it overlaps with other fixed metadata
- the ability to do things like preallocation of files efficiently (via
uninitialized extents), instead of zero-filling the whole file.
> Continuation inodes/chunkfs are an idea Arjan and I came up with,
> inspired loosely by the ext2 dirty bit code. The problem we were
> trying to solve is how to isolate the effects of file system
> corruption (from crash, bug, or I/O error) so that we didn't have to
> run fsck over the entire file system in order to repair it.
I think this is a great idea, and one that is very similar to what
we are doing with ext3 filesystems in Lustre. There is definitely
a desire to harden the ext3 code in many ways against such failures,
and being able to check independent parts of the filesystem is a
very desirable part of this.
> The solution we came up with is to create a "continuation inode" in
> every file system chunk which contains data for a particular file or
> directory. For example, if file "foo" has its inode in chunk A, and
> some file data in chunk B, we would create a continuation inode in
> chunk B. The continuation inode has a back pointer to the parent
> inode.
This needs some extra data in the directory entry, which I've already
been thinking about for ext3, so if you are looking at implementing
this for ext3 I'd be happy to share some ideas.
> One interesting possibility would be to combine this with the ext2
> dirty bit patches.
Put on your asbestos vest before suggesting any changes to ext2 :-).
> If we implement chunkfs on top of this, we could get away with fsck'ing
> only a few of the file systems each time, getting ext2-style performance
> with ext3-style fast recovery.
While fast recovery is one aspect of ext3 journaling, the other one
is that this allows multiple filesystem changes to be made atomically
and they are rolled back as a set if the system crashes in the middle.
> We'll talk about it more next week, I hope.
I look forward to it.
Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.
next prev parent reply other threads:[~2006-06-10 5:25 UTC|newest]
Thread overview: 294+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-06-09 1:20 [RFC 0/13] extents and 48bit ext3 Mingming Cao
2006-06-09 2:40 ` Valdis.Kletnieks
2006-06-09 8:20 ` Andreas Dilger
2006-06-09 18:35 ` [Ext2-devel] " Stephen C. Tweedie
2006-06-09 19:20 ` Jeff Garzik
2006-06-09 19:28 ` Alex Tomas
2006-06-09 19:32 ` Jeff Garzik
2006-06-09 19:41 ` Alex Tomas
2006-06-09 15:23 ` Mingming Cao
2006-06-09 2:49 ` Jeff Garzik
2006-06-09 8:35 ` Andreas Dilger
2006-06-09 15:08 ` Jeff Garzik
2006-06-09 15:25 ` Jeff Garzik
2006-06-09 15:40 ` Linus Torvalds
2006-06-09 15:47 ` Jeff Garzik
2006-06-09 15:55 ` Alex Tomas
2006-06-09 15:56 ` Jeff Garzik
2006-06-09 16:07 ` Alex Tomas
2006-06-09 16:09 ` [Ext2-devel] " Jeff Garzik
2006-06-09 18:04 ` Matthew Frost
2006-06-09 18:10 ` Alex Tomas
2006-06-09 18:14 ` [Ext2-devel] " Andreas Dilger
2006-06-09 18:51 ` Jeff Garzik
2006-06-09 19:39 ` Gerrit Huizenga
2006-06-09 19:45 ` [Ext2-devel] " Jeff Garzik
2006-06-09 20:38 ` Gerrit Huizenga
2006-06-10 10:03 ` Christoph Hellwig
2006-06-09 19:49 ` [Ext2-devel] " Theodore Tso
2006-06-09 20:04 ` Jeff Garzik
2006-06-09 20:57 ` Stephen C. Tweedie
2006-06-09 21:49 ` Jeff Garzik
2006-06-09 21:55 ` [Ext2-devel] " Stephen C. Tweedie
2006-06-09 23:44 ` Jeff Garzik
2006-06-10 0:45 ` [Ext2-devel] " Andreas Dilger
2006-06-10 0:47 ` Theodore Tso
2006-06-10 1:09 ` Jeff Garzik
2006-06-10 1:30 ` [Ext2-devel] " Andreas Dilger
2006-06-10 1:43 ` Jeff Garzik
2006-06-10 2:03 ` Theodore Tso
2006-06-10 2:11 ` [Ext2-devel] " Jeff Garzik
2006-06-10 2:54 ` Theodore Tso
2006-06-10 3:11 ` Jeff Garzik
2006-06-10 12:15 ` Theodore Tso
2006-06-10 14:31 ` Jeff Garzik
2006-06-10 2:58 ` [Ext2-devel] " Jeff Garzik
2006-06-10 2:26 ` Andreas Dilger
2006-06-10 2:31 ` Jeff Garzik
2006-06-10 4:22 ` Andreas Dilger
2006-06-09 22:37 ` [Ext2-devel] " Andreas Dilger
2006-06-11 16:02 ` Arjan van de Ven
2006-06-11 16:30 ` Nikita Danilov
2006-06-11 16:55 ` [Ext2-devel] " Arjan van de Ven
2006-06-12 6:35 ` Andreas Dilger
2006-06-12 22:06 ` [Ext2-devel] " Pavel Machek
2006-06-14 14:31 ` Barry K. Nathan
2006-06-14 21:34 ` [Ext2-devel] " Pavel Machek
2006-06-15 0:28 ` Barry K. Nathan
2006-06-15 4:55 ` Theodore Tso
2006-06-15 7:43 ` Barry K. Nathan
2006-06-15 9:15 ` Pavel Machek
2006-06-15 9:40 ` Barry K. Nathan
2006-06-15 9:50 ` [Ext2-devel] " Pavel Machek
2006-06-09 20:52 ` Stephen C. Tweedie
2006-06-09 21:47 ` [Ext2-devel] " Jeff Garzik
2006-06-10 0:41 ` James Morris
2006-06-09 16:01 ` Linus Torvalds
2006-06-09 20:38 ` Stephen C. Tweedie
2006-06-09 15:57 ` Jeff Garzik
2006-06-09 16:10 ` [Ext2-devel] " Alex Tomas
2006-06-09 16:10 ` Jeff Garzik
2006-06-09 16:24 ` Erik Mouw
2006-06-09 16:28 ` Jeff Garzik
2006-06-09 16:24 ` [Ext2-devel] " Chase Venters
2006-06-09 16:25 ` Alex Tomas
2006-06-09 16:28 ` Jeff Garzik
2006-06-09 16:50 ` Alex Tomas
2006-06-09 16:53 ` [Ext2-devel] " Jeff Garzik
2006-06-09 17:01 ` Alex Tomas
2006-06-09 17:10 ` Jeff Garzik
2006-06-09 16:25 ` Linus Torvalds
2006-06-09 16:48 ` Alex Tomas
2006-06-09 16:54 ` KELEMEN Peter
2006-06-09 16:55 ` Jeff Garzik
2006-06-09 17:12 ` [Ext2-devel] " Alex Tomas
2006-06-09 17:12 ` Jeff Garzik
2006-06-09 19:57 ` Theodore Tso
2006-06-09 20:09 ` Jeff Garzik
2006-06-09 20:14 ` Alex Tomas
2006-06-09 20:28 ` Jeff Garzik
2006-06-19 7:48 ` [Ext2-devel] " Helge Hafting
2006-06-09 20:38 ` Joel Becker
2006-06-09 20:50 ` Dave Jones
2006-06-09 21:09 ` Joel Becker
2006-06-09 21:51 ` Mike Snitzer
2006-06-09 21:32 ` [Ext2-devel] " Jeff Garzik
2006-06-09 22:56 ` Andreas Dilger
2006-06-09 23:06 ` Linus Torvalds
2006-06-09 23:09 ` Jeff Garzik
2006-06-09 23:37 ` [Ext2-devel] " Andreas Dilger
2006-06-09 23:54 ` Linus Torvalds
2006-06-09 21:03 ` Theodore Tso
2006-06-09 21:24 ` Joel Becker
2006-06-09 21:36 ` [Ext2-devel] " Chase Venters
2006-06-09 21:51 ` Theodore Tso
2006-06-09 22:07 ` Joel Becker
2006-06-09 22:31 ` [Ext2-devel] " Theodore Tso
2006-06-09 22:47 ` Joel Becker
2006-06-09 23:54 ` [Ext2-devel] " Theodore Tso
2006-06-09 23:48 ` Jeff Garzik
2006-06-12 8:58 ` Jes Sorensen
2006-06-10 0:07 ` Olivier Galibert
2006-06-10 0:13 ` Jeff Garzik
2006-06-09 16:54 ` [Ext2-devel] " Linus Torvalds
2006-06-09 17:04 ` Alex Tomas
2006-06-09 17:30 ` [Ext2-devel] " Linus Torvalds
2006-06-09 17:41 ` Matthew Wilcox
2006-06-09 17:50 ` Jeff Garzik
2006-06-09 18:00 ` Alex Tomas
2006-06-09 18:04 ` [Ext2-devel] " Linus Torvalds
2006-06-09 18:17 ` Michael Poole
2006-06-09 17:44 ` Theodore Tso
2006-06-09 17:58 ` Jeff Garzik
2006-06-09 18:10 ` [Ext2-devel] " Andreas Dilger
2006-06-09 18:22 ` Linus Torvalds
2006-06-09 18:30 ` Alex Tomas
2006-06-09 18:38 ` Linus Torvalds
2006-06-09 18:50 ` [Ext2-devel] " Chase Venters
2006-06-09 19:00 ` Chase Venters
2006-06-10 13:33 ` Adrian Bunk
2006-06-09 19:01 ` Jeff Garzik
2006-06-10 19:27 ` Kyle Moffett
2006-06-10 19:44 ` Linus Torvalds
2006-06-10 20:02 ` [Ext2-devel] " Linus Torvalds
2006-06-10 21:26 ` Theodore Tso
2006-06-10 21:31 ` Linus Torvalds
2006-06-10 22:12 ` Jeff Garzik
2006-06-10 22:21 ` Jeff Garzik
2006-06-11 4:39 ` Stable/devel policy - was Re: [Ext2-devel] " Neil Brown
2006-06-11 5:19 ` Stable/devel policy - was " Linus Torvalds
2006-06-11 7:32 ` Ingo Molnar
2006-06-13 0:28 ` Stable/devel policy - was Re: [Ext2-devel] " Mingming Cao
2006-06-09 19:21 ` Alan Cox
2006-06-09 19:13 ` [Ext2-devel] " Chase Venters
2006-06-09 19:24 ` Alex Tomas
2006-06-09 19:25 ` Jeff Garzik
2006-06-09 19:35 ` Alex Tomas
2006-06-09 19:35 ` [Ext2-devel] " Jeff Garzik
2006-06-09 20:44 ` Joel Becker
2006-06-09 20:49 ` Alex Tomas
2006-06-09 21:11 ` Joel Becker
2006-06-09 21:20 ` Alex Tomas
2006-06-09 21:29 ` Joel Becker
2006-06-09 21:33 ` Alex Tomas
2006-06-09 21:43 ` Joel Becker
2006-06-11 20:14 ` [Ext2-devel] " grundig
2006-06-14 16:45 ` Alex Tomas
2006-06-09 19:22 ` Alex Tomas
2006-06-09 19:22 ` Jeff Garzik
2006-06-09 20:16 ` Andreas Dilger
2006-06-09 20:31 ` Linus Torvalds
2006-06-09 20:31 ` Jeff Garzik
2006-06-09 18:43 ` [Ext2-devel] " Jeff Garzik
2006-06-09 18:50 ` Diego Calleja
2006-06-09 19:08 ` Diego Calleja
2006-06-09 18:40 ` [Ext2-devel] " Jeff Garzik
2006-06-09 18:59 ` Andrew Morton
2006-06-09 19:16 ` Jeff Garzik
2006-06-09 20:27 ` [Ext2-devel] " Chase Venters
2006-06-09 20:44 ` Alan Cox
2006-06-11 15:52 ` [Ext2-devel] " Arjan van de Ven
2006-06-09 18:41 ` Jeff Garzik
2006-06-09 17:12 ` Jeff Anderson-Lee
2006-06-09 18:02 ` Andrew Morton
2006-06-10 19:10 ` Kyle Moffett
2006-06-10 19:27 ` Linus Torvalds
2006-06-09 15:28 ` [Ext2-devel] " Alex Tomas
2006-06-09 15:31 ` Matthew Wilcox
2006-06-10 3:26 ` Continuation Inodes Explained! (was Re: [RFC 0/13] extents and 48bit ext3) Valerie Henson
2006-06-10 5:25 ` Andreas Dilger [this message]
2006-06-10 5:41 ` Valerie Henson
2006-06-10 6:22 ` Andreas Dilger
2006-06-10 14:22 ` Jeff Garzik
2006-06-09 15:44 ` [Ext2-devel] [RFC 0/13] extents and 48bit ext3 Jeff Garzik
2006-06-09 15:53 ` Alex Tomas
2006-06-09 15:52 ` Jeff Garzik
2006-06-09 16:02 ` Alex Tomas
2006-06-09 16:04 ` [Ext2-devel] " Jeff Garzik
2006-06-09 18:29 ` Andreas Dilger
2006-06-09 15:53 ` [Ext2-devel] " Gerrit Huizenga
2006-06-09 16:03 ` Jeff Garzik
2006-06-09 16:09 ` Linus Torvalds
2006-06-09 17:58 ` Gerrit Huizenga
2006-06-09 18:25 ` [Ext2-devel] " Chase Venters
2006-06-10 13:46 ` Adrian Bunk
2006-06-10 14:42 ` Ingo Molnar
2006-06-10 15:03 ` Jeff Garzik
2006-06-11 6:00 ` Ingo Molnar
2006-06-10 16:00 ` Adrian Bunk
2006-06-10 16:05 ` Christoph Hellwig
2006-06-10 23:05 ` Mike Galbraith
2006-06-13 13:34 ` [Ext2-devel] " Helge Hafting
2006-06-09 20:32 ` Stephen C. Tweedie
2006-06-09 20:46 ` Linus Torvalds
2006-06-09 20:56 ` Alex Tomas
2006-06-20 6:15 ` [Ext2-devel] " Qi Yong
2006-06-20 8:26 ` Laurent Vivier
2006-06-20 8:30 ` Jeff Garzik
2006-06-20 9:21 ` Laurent Vivier
2006-06-20 9:48 ` Jeff Garzik
2006-06-20 10:40 ` Laurent Vivier
2006-06-09 17:14 ` Alan Cox
2006-06-09 9:13 ` Christoph Hellwig
2006-06-09 10:07 ` Andrew Morton
2006-06-09 15:40 ` Jeff Garzik
2006-06-09 15:42 ` Matthew Wilcox
2006-06-09 15:51 ` Jeff Garzik
2006-06-09 17:29 ` Alan Cox
2006-06-09 16:56 ` Andrew Morton
2006-06-09 17:07 ` Jeff Garzik
2006-06-09 17:35 ` Andrew Morton
2006-06-09 17:48 ` Jeff Garzik
2006-06-09 17:59 ` Jeff Garzik
2006-06-09 18:27 ` [Ext2-devel] " Mike Snitzer
2006-06-09 18:54 ` Jeff Garzik
2006-06-09 19:22 ` Alex Tomas
2006-06-09 19:23 ` Jeff Garzik
2006-06-09 22:49 ` Valdis.Kletnieks
2006-06-09 23:34 ` [Ext2-devel] " Andreas Dilger
2006-06-10 13:49 ` Adrian Bunk
2006-06-10 13:51 ` Christoph Hellwig
2006-06-10 14:54 ` Jeff Garzik
2006-06-10 18:01 ` [Ext2-devel] " Andreas Dilger
2006-06-09 21:42 ` Sonny Rao
2006-06-09 22:15 ` Andrew Morton
2006-06-09 23:11 ` Andreas Dilger
2006-06-09 23:15 ` Jeff Garzik
2006-06-10 3:37 ` Valerie Henson
2006-06-10 3:49 ` Nathan Scott
2006-06-09 18:23 ` Michael Poole
2006-06-09 18:55 ` Jeff Garzik
2006-06-09 19:42 ` [Ext2-devel] " Gerrit Huizenga
2006-06-09 20:00 ` Jeff Garzik
2006-06-09 20:08 ` Alex Tomas
2006-06-09 20:10 ` [Ext2-devel] " Jeff Garzik
2006-06-09 20:35 ` Theodore Tso
2006-06-09 21:41 ` Jeff Garzik
2006-06-09 21:45 ` [Ext2-devel] " Michael Poole
2006-06-09 21:53 ` Jeff Garzik
2006-06-09 22:04 ` Theodore Tso
2006-06-10 0:49 ` Sven-Haegar Koch
2006-06-10 1:06 ` Theodore Tso
2006-06-10 14:07 ` Olivier Galibert
2006-06-10 19:52 ` Theodore Tso
2006-06-09 10:49 ` Andreas Dilger
2006-06-09 11:26 ` Alex Tomas
2006-06-09 14:23 ` [Ext2-devel] " Jeff Garzik
2006-06-09 14:33 ` Alex Tomas
2006-06-09 14:34 ` Alex Tomas
2006-06-09 14:35 ` Jeff Garzik
2006-06-09 14:57 ` Alex Tomas
2006-06-09 15:17 ` [Ext2-devel] " Jeff Garzik
2006-06-09 16:21 ` Mike Snitzer
2006-06-09 16:27 ` Jeff Garzik
2006-06-09 16:48 ` Alex Tomas
2006-06-09 16:51 ` Jeff Garzik
2006-06-09 16:33 ` Alex Tomas
2006-06-09 16:37 ` [Ext2-devel] " Jeff Garzik
2006-06-09 22:52 ` Valdis.Kletnieks
2006-06-09 23:21 ` Andreas Dilger
2006-06-10 1:21 ` Valdis.Kletnieks
2006-06-10 2:09 ` [Ext2-devel] " Andreas Dilger
2006-06-10 2:45 ` Nicholas Miell
2006-06-10 4:29 ` Andreas Dilger
2006-06-09 16:56 ` Andreas Dilger
2006-06-09 17:32 ` [Ext2-devel] " Greg KH
2006-06-09 18:48 ` Jeff Garzik
2006-06-30 0:16 ` [RFC][Update 0/16]extents and 48bit ext3/4 patches Mingming Cao
2006-06-30 0:16 ` [RFC][Update][Patch 1/16]core extent map support Mingming Cao
2006-06-30 0:17 ` [RFC][Update][Patch 2/16]sector_t type format string Mingming Cao
2006-06-30 0:17 ` [RFC][Update][Patch 3/16]convert ext3_fsblk_t to sector_t to support >32 bit block in kernel Mingming Cao
2006-06-30 0:17 ` [RFC][Update][Patch 4/16]support 48 bit blk number in extents Mingming Cao
2006-06-30 0:17 ` [RFC][Update][Patch 5/16]block type convert " Mingming Cao
2006-06-30 0:17 ` [RFC][Update][Patch 6/16]handing unitialized extents Mingming Cao
2006-06-30 0:17 ` [RFC][Update][Patch 7/16]Core 64 bit JBD changes Mingming Cao
2006-06-30 0:18 ` [RFC][Update][Patch 8/16]Avoid potential block overflow when writing journal metadata tags Mingming Cao
2006-06-30 0:18 ` [RFC][Update][Patch 9/16]Fix reading of 32-bit tag descriptors Mingming Cao
2006-06-30 0:18 ` [RFC][Update][Patch 10/16]Cleanup journal_tag_bytes() Mingming Cao
2006-06-30 0:18 ` [RFC][Update][Patch 11/16]JBD layer in-kernel block variables type fixes Mingming Cao
2006-06-30 0:18 ` [RFC][Update][Patch 12/16]Fix undefined ">> 32" in revoke code Mingming Cao
2006-06-30 3:15 ` H. Peter Anvin
2006-06-30 0:18 ` [RFC][Update][Patch 13/16] 48 bit on-disk i_file_acl support Mingming Cao
2006-06-30 0:19 ` [RFC][Update][Patch 14/16] 48bit super block (metadata) changes Mingming Cao
2006-06-30 0:19 ` [RFC][Update][Patch 15/16] compile warning fix and change 64bit to INCOMPAT feature Mingming Cao
2006-06-30 0:19 ` [RFC][Update][Patch 16/16]Update ext3 superblock definition Mingming Cao
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20060610052502.GY5964@schatzie.adilger.int \
--to=adilger@clusterfs.com \
--cc=akpm@osdl.org \
--cc=alex@clusterfs.com \
--cc=arjan@linux.intel.com \
--cc=cmm@us.ibm.com \
--cc=ext2-devel@lists.sourceforge.net \
--cc=jeff@garzik.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=matthew@wil.cx \
--cc=torvalds@osdl.org \
--cc=val_henson@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).