From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jeff Garzik Subject: Re: Continuation Inodes Explained! (was Re: [RFC 0/13] extents and 48bit ext3) Date: Sat, 10 Jun 2006 10:22:08 -0400 Message-ID: <448AD590.40005@garzik.org> References: <1149816055.4066.60.camel@dyn9047017069.beaverton.ibm.com> <4488E1A4.20305@garzik.org> <20060609083523.GQ5964@schatzie.adilger.int> <44898EE3.6080903@garzik.org> <20060609153116.GM1651@parisc-linux.org> <20060610032623.GG10524@goober> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Cc: Andrew Morton , Matthew Wilcox , Arjan van de Ven , ext2-devel , linux-kernel@vger.kernel.org, Linus Torvalds , cmm@us.ibm.com, linux-fsdevel@vger.kernel.org, Alex Tomas , Andreas Dilger Return-path: To: Valerie Henson In-Reply-To: <20060610032623.GG10524@goober> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: ext2-devel-bounces@lists.sourceforge.net Errors-To: ext2-devel-bounces@lists.sourceforge.net List-Id: linux-fsdevel.vger.kernel.org Valerie Henson wrote: > So what the heck are continuation inodes? Actually, we named this > "chunkfs" - not particularly descriptive, maybe continuation inodes is > a better term. [...] > The basic idea is to create a bunch of small file systems - chunks - > which look like one big file system to the administrator. Major Back when I was still playing with my experimental filesystem, one of the short-list features I was planning on implementing was the allocation of both metadata and data from the same underlying data store, essentially collections of "buckets" for data. The data store would be a succession of progressively-smaller buckets. Typical bucket sizes (chosen by admin) on a single filesystem might be: 1G, 128M, 4M, 1M, 64k, 4k. The largest (top-most) bucket is the fundamental unit of allocation for the filesystem, from which all other metadata and data is read/allocated. So in my example above, the 1G bucket is analagous to a single chunk in chunkfs, and any number of 1G buckets -- from any number of block devices -- may comprise a single filesystem. New inode tables, bitmap chunks, directories, large files, etc. are all allocated from an "appropriate" bucket. IMO this type of solution provides fsck-friendly isolation, and adds sufficient flexibility for doing things like delayed alloc, metadata-is-a-file, etc. Jeff