* Updated ext4/jbd2 patches based on 2.6.19-rc1
@ 2006-10-05 18:23 Dave Kleikamp
2006-10-05 20:19 ` Andrew Morton
` (4 more replies)
0 siblings, 5 replies; 45+ messages in thread
From: Dave Kleikamp @ 2006-10-05 18:23 UTC (permalink / raw)
To: Andrew Morton; +Cc: ext4 development
I have rebuilt the ext4/jbd2 patches against linux-2.6.19-rc1. The
patch set is located at
ftp://kernel.org/pub/linux/kernel/people/shaggy/ext4/2.6.19-rc1/ext4-patches-2.6.19-rc1.tar.gz
Broken out patches in
ftp://kernel.org/pub/linux/kernel/people/shaggy/ext4/2.6.19-rc1/ext4-patches
The patches begin with exact copies of the rc1 version of ext3 and jbd,
so there are no ext3/jbd patches currently in mainline that need to be
applied to the new code. I'll continue to watch ext3/jbd for patches
that need to be ported to ext4/jbd2.
--
David Kleikamp
IBM Linux Technology Center
^ permalink raw reply [flat|nested] 45+ messages in thread* Re: Updated ext4/jbd2 patches based on 2.6.19-rc1 2006-10-05 18:23 Updated ext4/jbd2 patches based on 2.6.19-rc1 Dave Kleikamp @ 2006-10-05 20:19 ` Andrew Morton 2006-10-05 23:25 ` Linus Torvalds 2006-10-05 21:59 ` Andrew Morton ` (3 subsequent siblings) 4 siblings, 1 reply; 45+ messages in thread From: Andrew Morton @ 2006-10-05 20:19 UTC (permalink / raw) To: Dave Kleikamp; +Cc: ext4 development, Linus Torvalds On Thu, 05 Oct 2006 13:23:30 -0500 Dave Kleikamp <shaggy@austin.ibm.com> wrote: > I have rebuilt the ext4/jbd2 patches against linux-2.6.19-rc1. The > patch set is located at > ftp://kernel.org/pub/linux/kernel/people/shaggy/ext4/2.6.19-rc1/ext4-patches-2.6.19-rc1.tar.gz > > Broken out patches in > ftp://kernel.org/pub/linux/kernel/people/shaggy/ext4/2.6.19-rc1/ext4-patches > > The patches begin with exact copies of the rc1 version of ext3 and jbd, > so there are no ext3/jbd patches currently in mainline that need to be > applied to the new code. I'll continue to watch ext3/jbd for patches > that need to be ported to ext4/jbd2. OK... Linus, what's the best way of doing this? Will git dtrt with a patch which copies files, or would a script which does the mkdir's and cp's be better? ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: Updated ext4/jbd2 patches based on 2.6.19-rc1 2006-10-05 20:19 ` Andrew Morton @ 2006-10-05 23:25 ` Linus Torvalds 2006-10-06 12:50 ` Dave Kleikamp 0 siblings, 1 reply; 45+ messages in thread From: Linus Torvalds @ 2006-10-05 23:25 UTC (permalink / raw) To: Andrew Morton; +Cc: Dave Kleikamp, ext4 development On Thu, 5 Oct 2006, Andrew Morton wrote: > > Linus, what's the best way of doing this? Will git dtrt with a patch which > copies files, or would a script which does the mkdir's and cp's be better? Git should dtrt. In fact, if you use git diff -C it should generate the appropriate "file copied" things automatically, and you don't need any huge file at all, you'll get a "patch" that looks something like diff --git a/fs/ext3/inode.c b/fs/ext4/inode.c similarity index 100% copy from fs/ext3/inode.c copy to fs/ext4/inode.c diff --git a/fs/ext3/super.c b/fs/ext4/super.c similarity index 98% copy from fs/ext3/super.c copy to fs/ext4/super.c index xyz..zzy 100644 --- a/fs/ext3/super.c +++ b/fs/ext4/super.c .. small diff that changes "ext3" to "ext4" goes here .. ie you'll effectively get the best of both worlds: a "diff", but one that is actually readable and shows what is going on. I hate to beat my own drum (not really), but git really _is_ a lot better than anything else out there ;) Linus ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: Updated ext4/jbd2 patches based on 2.6.19-rc1 2006-10-05 23:25 ` Linus Torvalds @ 2006-10-06 12:50 ` Dave Kleikamp 2006-10-06 16:11 ` Linus Torvalds 0 siblings, 1 reply; 45+ messages in thread From: Dave Kleikamp @ 2006-10-06 12:50 UTC (permalink / raw) To: Linus Torvalds; +Cc: Andrew Morton, ext4 development On Thu, 2006-10-05 at 16:25 -0700, Linus Torvalds wrote: > > On Thu, 5 Oct 2006, Andrew Morton wrote: > > > > Linus, what's the best way of doing this? Will git dtrt with a patch which > > copies files, or would a script which does the mkdir's and cp's be better? > > Git should dtrt. > > In fact, if you use > > git diff -C > > it should generate the appropriate "file copied" things automatically, and > you don't need any huge file at all, you'll get a "patch" that looks > something like > > diff --git a/fs/ext3/inode.c b/fs/ext4/inode.c > similarity index 100% > copy from fs/ext3/inode.c > copy to fs/ext4/inode.c > diff --git a/fs/ext3/super.c b/fs/ext4/super.c > similarity index 98% > copy from fs/ext3/super.c > copy to fs/ext4/super.c > index xyz..zzy 100644 > --- a/fs/ext3/super.c > +++ b/fs/ext4/super.c > .. small diff that changes "ext3" to "ext4" goes here .. > > > ie you'll effectively get the best of both worlds: a "diff", but one that > is actually readable and shows what is going on. We haven't been using git to manage ext4 so far, although in hindsight it probably would have made things easier. I'm assuming that you're just suggesting this for educational purposes and that git will handle the patches that Andrew picked up into -mm just fine. I could re-generate the patches that do the copies from git, but I don't believe it will be that beneficial at this point. > I hate to beat my own drum (not really), but git really _is_ a lot better > than anything else out there ;) No argument here :-) Shaggy -- David Kleikamp IBM Linux Technology Center ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: Updated ext4/jbd2 patches based on 2.6.19-rc1 2006-10-06 12:50 ` Dave Kleikamp @ 2006-10-06 16:11 ` Linus Torvalds 0 siblings, 0 replies; 45+ messages in thread From: Linus Torvalds @ 2006-10-06 16:11 UTC (permalink / raw) To: Dave Kleikamp; +Cc: Andrew Morton, ext4 development On Fri, 6 Oct 2006, Dave Kleikamp wrote: > On Thu, 2006-10-05 at 16:25 -0700, Linus Torvalds wrote: > > We haven't been using git to manage ext4 so far, although in hindsight > it probably would have made things easier. I'm assuming that you're > just suggesting this for educational purposes and that git will handle > the patches that Andrew picked up into -mm just fine. Well, also that even if you're not using git, what any random person can do is to actually just import the current state into git, and do the simple "git diff -C" thing to get the nicer diff. One of the advantages about git is that "intent" doesn't matter. Git only tracks cold, hard data. So if you copied a file, git doesn't care one whit whether you _tell_ it that you copied it or not: it will purely look at the end result, and say "you copied it" if the file looks the same. > I could re-generate the patches that do the copies from git, but I don't > believe it will be that beneficial at this point. The only advantage (but I'd argue that it's a real one, and very possibly worth it) is that when this gets sent to me (or anybody else) by email, it can be sent in a format that is actually readable, instead of sending it as a huge patch that doesn't actually talk about what it does.. Linus ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: Updated ext4/jbd2 patches based on 2.6.19-rc1 2006-10-05 18:23 Updated ext4/jbd2 patches based on 2.6.19-rc1 Dave Kleikamp 2006-10-05 20:19 ` Andrew Morton @ 2006-10-05 21:59 ` Andrew Morton 2006-10-06 0:39 ` Andrew Morton ` (2 subsequent siblings) 4 siblings, 0 replies; 45+ messages in thread From: Andrew Morton @ 2006-10-05 21:59 UTC (permalink / raw) To: Dave Kleikamp; +Cc: ext4 development On Thu, 05 Oct 2006 13:23:30 -0500 Dave Kleikamp <shaggy@austin.ibm.com> wrote: > I have rebuilt the ext4/jbd2 patches against linux-2.6.19-rc1. The > patch set is located at > ftp://kernel.org/pub/linux/kernel/people/shaggy/ext4/2.6.19-rc1/ext4-patches-2.6.19-rc1.tar.gz > > Broken out patches in > ftp://kernel.org/pub/linux/kernel/people/shaggy/ext4/2.6.19-rc1/ext4-patches > > The patches begin with exact copies of the rc1 version of ext3 and jbd, > so there are no ext3/jbd patches currently in mainline that need to be > applied to the new code. I'll continue to watch ext3/jbd for patches > that need to be ported to ext4/jbd2. The only patch I can see at present in -mm and all the git trees is fs-cache-provide-a-filesystem-specific-syncable-page-bit.patch, which touches ext3. It renames PageChecked to PageFsMisc and so it'll break the build nicely. I'll try to remember to cc this list on jbd and ext3-affecting patches. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: Updated ext4/jbd2 patches based on 2.6.19-rc1 2006-10-05 18:23 Updated ext4/jbd2 patches based on 2.6.19-rc1 Dave Kleikamp 2006-10-05 20:19 ` Andrew Morton 2006-10-05 21:59 ` Andrew Morton @ 2006-10-06 0:39 ` Andrew Morton 2006-10-10 6:29 ` Andrew Morton 2006-10-06 3:55 ` Updated ext4/jbd2 patches based on 2.6.19-rc1 Andrew Morton 2006-10-06 4:31 ` Updated ext4/jbd2 patches based on 2.6.19-rc1 Andrew Morton 4 siblings, 1 reply; 45+ messages in thread From: Andrew Morton @ 2006-10-06 0:39 UTC (permalink / raw) To: Dave Kleikamp; +Cc: ext4 development On Thu, 05 Oct 2006 13:23:30 -0500 Dave Kleikamp <shaggy@austin.ibm.com> wrote: > I have rebuilt the ext4/jbd2 patches against linux-2.6.19-rc1. The > patch set is located at > ftp://kernel.org/pub/linux/kernel/people/shaggy/ext4/2.6.19-rc1/ext4-patches-2.6.19-rc1.tar.gz > > Broken out patches in > ftp://kernel.org/pub/linux/kernel/people/shaggy/ext4/2.6.19-rc1/ext4-patches > > The patches begin with exact copies of the rc1 version of ext3 and jbd, > so there are no ext3/jbd patches currently in mainline that need to be > applied to the new code. I'll continue to watch ext3/jbd for patches > that need to be ported to ext4/jbd2. Could we please have a few nice words about ext4 for the record? Like, what its features are, how one creates an instance, where to get the correct userspace tools from, stability level, any known shortcomings, issues, etc? Thanks. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: Updated ext4/jbd2 patches based on 2.6.19-rc1 2006-10-06 0:39 ` Andrew Morton @ 2006-10-10 6:29 ` Andrew Morton 2006-10-10 7:54 ` Suparna Bhattacharya 0 siblings, 1 reply; 45+ messages in thread From: Andrew Morton @ 2006-10-10 6:29 UTC (permalink / raw) To: Dave Kleikamp, ext4 development On Thu, 5 Oct 2006 17:39:33 -0700 Andrew Morton <akpm@osdl.org> wrote: > Could we please have a few nice words about ext4 for the record? Like, > what its features are, how one creates an instance, where to get the > correct userspace tools from, stability level, any known shortcomings, > issues, etc? So I guess I get to write this. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: Updated ext4/jbd2 patches based on 2.6.19-rc1 2006-10-10 6:29 ` Andrew Morton @ 2006-10-10 7:54 ` Suparna Bhattacharya 2006-10-10 8:14 ` Andrew Morton 0 siblings, 1 reply; 45+ messages in thread From: Suparna Bhattacharya @ 2006-10-10 7:54 UTC (permalink / raw) To: Andrew Morton; +Cc: Dave Kleikamp, ext4 development On Mon, Oct 09, 2006 at 11:29:27PM -0700, Andrew Morton wrote: > On Thu, 5 Oct 2006 17:39:33 -0700 > Andrew Morton <akpm@osdl.org> wrote: > > > Could we please have a few nice words about ext4 for the record? Like, > > what its features are, how one creates an instance, where to get the > > correct userspace tools from, stability level, any known shortcomings, > > issues, etc? > > So I guess I get to write this. Hopefully not :) We should be able to put something together for a start. Where should this reside ? Under Documentation/filesystems/ext4.txt ? However, since this is still very clearly a "development" branch at the moment, with lots of ongoing work both on the kernel and even more so on the tools side, how much detail are you looking for ? Regards Suparna > - > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Suparna Bhattacharya (suparna@in.ibm.com) Linux Technology Center IBM Software Lab, India ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: Updated ext4/jbd2 patches based on 2.6.19-rc1 2006-10-10 7:54 ` Suparna Bhattacharya @ 2006-10-10 8:14 ` Andrew Morton 2006-10-10 20:02 ` [RFC] [PATCH] Documentation/filesystems/ext4.txt Dave Kleikamp 0 siblings, 1 reply; 45+ messages in thread From: Andrew Morton @ 2006-10-10 8:14 UTC (permalink / raw) To: suparna; +Cc: Dave Kleikamp, ext4 development On Tue, 10 Oct 2006 13:24:02 +0530 Suparna Bhattacharya <suparna@in.ibm.com> wrote: > On Mon, Oct 09, 2006 at 11:29:27PM -0700, Andrew Morton wrote: > > On Thu, 5 Oct 2006 17:39:33 -0700 > > Andrew Morton <akpm@osdl.org> wrote: > > > > > Could we please have a few nice words about ext4 for the record? Like, > > > what its features are, how one creates an instance, where to get the > > > correct userspace tools from, stability level, any known shortcomings, > > > issues, etc? > > > > So I guess I get to write this. > > Hopefully not :) > We should be able to put something together for a start. Where should this > reside ? Under Documentation/filesystems/ext4.txt ? That sounds appropriate. And in the patch changelog. > However, since this is still very clearly a "development" branch at the moment, > with lots of ongoing work both on the kernel and even more so on the > tools side, how much detail are you looking for ? We should make it as easy as possible for our testers to get up and running and using all available features. They're skilled, so a simple user guide which tells them what the features are and how to access them should suffice, thanks. ^ permalink raw reply [flat|nested] 45+ messages in thread
* [RFC] [PATCH] Documentation/filesystems/ext4.txt 2006-10-10 8:14 ` Andrew Morton @ 2006-10-10 20:02 ` Dave Kleikamp 2006-10-10 20:56 ` Andrew Morton ` (2 more replies) 0 siblings, 3 replies; 45+ messages in thread From: Dave Kleikamp @ 2006-10-10 20:02 UTC (permalink / raw) To: Andrew Morton; +Cc: suparna, ext4 development On Tue, 2006-10-10 at 01:14 -0700, Andrew Morton wrote: > On Tue, 10 Oct 2006 13:24:02 +0530 > Suparna Bhattacharya <suparna@in.ibm.com> wrote: > > > On Mon, Oct 09, 2006 at 11:29:27PM -0700, Andrew Morton wrote: > > > On Thu, 5 Oct 2006 17:39:33 -0700 > > > Andrew Morton <akpm@osdl.org> wrote: > > > > > > > Could we please have a few nice words about ext4 for the record? Like, > > > > what its features are, how one creates an instance, where to get the > > > > correct userspace tools from, stability level, any known shortcomings, > > > > issues, etc? > > > > > > So I guess I get to write this. Sorry I didn't get something written sooner. This is based off of what you put in the -mm1 announcement. > > Hopefully not :) > > We should be able to put something together for a start. Where should this > > reside ? Under Documentation/filesystems/ext4.txt ? Suparna put this together and I updated it a bit. > That sounds appropriate. And in the patch changelog. How do you want to handle the patch set? I could resend it with more comments, put it into git, or do you just want to plug the comments into the patches you are carrying? I can do whatever works best for you. > > However, since this is still very clearly a "development" branch at the moment, > > with lots of ongoing work both on the kernel and even more so on the > > tools side, how much detail are you looking for ? > > We should make it as easy as possible for our testers to get up and running > and using all available features. They're skilled, so a simple user guide > which tells them what the features are and how to access them should > suffice, thanks. This file, ext4.txt, was put together with information from Andrew Morton, Andreas Dilger, Suparna Bhattacharya, and Ted Ts'o. I copied the mount options, with the exception of "extents", from ext3.txt, so if anyone is aware of anything out-of-date, please let me know. Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com> diff -Nurp linux-orig/Documentation/filesystems/00-INDEX linux/Documentation/filesystems/00-INDEX --- linux-orig/Documentation/filesystems/00-INDEX 2006-10-05 07:22:05.000000000 -0500 +++ linux/Documentation/filesystems/00-INDEX 2006-10-06 17:30:59.000000000 -0500 @@ -34,6 +34,8 @@ ext2.txt - info, mount options and specifications for the Ext2 filesystem. ext3.txt - info, mount options and specifications for the Ext3 filesystem. +ext4.txt + - info, mount options and specifications for the Ext4 filesystem. files.txt - info on file management in the Linux kernel. fuse.txt diff -Nurp linux-orig/Documentation/filesystems/ext4.txt linux/Documentation/filesystems/ext4.txt --- linux-orig/Documentation/filesystems/ext4.txt 1969-12-31 18:00:00.000000000 -0600 +++ linux/Documentation/filesystems/ext4.txt 2006-10-10 14:25:38.000000000 -0500 @@ -0,0 +1,236 @@ + +Ext4 Filesystem +=============== + +This is a development version of the ext4 filesystem, an advanced level +of the ext3 filesystem which incorporates scalability and reliability +enhancements for supporting large filesystems (64 bit) in keeping with +increasing disk capacities and state-of-the-art feature requirements. + +Mailing list: linux-ext4@vger.kernel.org + + +1. Quick usage instructions: +=========================== + + - Grab updated e2fsprogs from + ftp://ftp.kernel.org/pub/linux/kernel/people/tytso/e2fsprogs-interim/ + This is a patchset on top of e2fsprogs-1.39, which can be found at + ftp://ftp.kernel.org/pub/linux/kernel/people/tytso/e2fsprogs/ + + - It's still mke2fs -j /dev/hda1 + + - mount /dev/hda1 /wherever -t ext4dev + + - To enable extents, + + mount /dev/hda1 /wherever -t ext4dev -o extents + + - The filesystem is compatible with the ext3 driver until you add a file + which has extents (ie: `mount -o extents', then create a file). + + NOTE: The "extents" mount flag is temporary. It will soon go away and + extents will be enabled by the "-o extents" flag to mke2fs or tune2fs + + - When comparing performance with other filesystems, remember that + ext3/4 by default offers higher data integrity guarantees than most. So + when comparing with a metadata-only journalling filesystem, use `mount -o + data=writeback'. And you might as well use `mount -o nobh' too along + with it. Making the journal larger than the mke2fs default often helps + performance with metadata-intensive workloads. + +2. Features +=========== + +2.1 Currently available + +* ability to use filesystems > 16TB +* extent format reduces metadata overhead (RAM, IO for access, transactions) +* extent format more robust in face of on-disk corruption due to magics, +* internal redunancy in tree + +2.1 Previously available, soon to be enabled by default by "mkefs.ext4": + +* dir_index and resize inode will be on by default +* large inodes will be used by default for fast EAs, nsec timestamps, etc + +2.2 Candidate features for future inclusion + +There are several under discussion, whether they all make it in is +partly a function of how much time everyone has to work on them: + +* improved file allocation (multi-block alloc, delayed alloc; basically done) +* fix 32000 subdirectory limit (patch exists, needs some e2fsck work) +* nsec timestamps for mtime, atime, ctime, create time (patch exists, + needs some e2fsck work) +* inode version field on disk (NFSv4, Lustre; prototype exists) +* reduced mke2fs/e2fsck time via uninitialized groups (prototype exists) +* journal checksumming for robustness, performance (prototype exists) +* persistent file preallocation (e.g for streaming media, databases) + +Features like metadata checksumming have been discussed and planned for +a bit but no patches exist yet so I'm not sure they're in the near-term +roadmap. + +The big performance win will come with mballoc and delalloc. CFS has +been using mballoc for a few years already with Lustre, and IBM + Bull +did a lot of benchmarking on it. The reason it isn't in the first set of +patches is partly a manageability issue, and partly because it doesn't +directly affect the on-disk format (outside of much better allocation) +so it isn't critical to get into the first round of changes. I believe +Alex is working on a new set of patches right now. + +3. Options +========== + +When mounting an ext4 filesystem, the following option are accepted: +(*) == default + +extents ext4 will use extents to address file data. The + file system will no longer be mountable by ext3. + +journal=update Update the ext4 file system's journal to the current + format. + +journal=inum When a journal already exists, this option is ignored. + Otherwise, it specifies the number of the inode which + will represent the ext4 file system's journal file. + +journal_dev=devnum When the external journal device's major/minor numbers + have changed, this option allows the user to specify + the new journal location. The journal device is + identified through its new major/minor numbers encoded + in devnum. + +noload Don't load the journal on mounting. + +data=journal All data are committed into the journal prior to being + written into the main file system. + +data=ordered (*) All data are forced directly out to the main file + system prior to its metadata being committed to the + journal. + +data=writeback Data ordering is not preserved, data may be written + into the main file system after its metadata has been + committed to the journal. + +commit=nrsec (*) Ext4 can be told to sync all its data and metadata + every 'nrsec' seconds. The default value is 5 seconds. + This means that if you lose your power, you will lose + as much as the latest 5 seconds of work (your + filesystem will not be damaged though, thanks to the + journaling). This default value (or any low value) + will hurt performance, but it's good for data-safety. + Setting it to 0 will have the same effect as leaving + it at the default (5 seconds). + Setting it to very large values will improve + performance. + +barrier=1 This enables/disables barriers. barrier=0 disables + it, barrier=1 enables it. + +orlov (*) This enables the new Orlov block allocator. It is + enabled by default. + +oldalloc This disables the Orlov block allocator and enables + the old block allocator. Orlov should have better + performance - we'd like to get some feedback if it's + the contrary for you. + +user_xattr Enables Extended User Attributes. Additionally, you + need to have extended attribute support enabled in the + kernel configuration (CONFIG_EXT4_FS_XATTR). See the + attr(5) manual page and http://acl.bestbits.at/ to + learn more about extended attributes. + +nouser_xattr Disables Extended User Attributes. + +acl Enables POSIX Access Control Lists support. + Additionally, you need to have ACL support enabled in + the kernel configuration (CONFIG_EXT4_FS_POSIX_ACL). + See the acl(5) manual page and http://acl.bestbits.at/ + for more information. + +noacl This option disables POSIX Access Control List + support. + +reservation + +noreservation + +bsddf (*) Make 'df' act like BSD. +minixdf Make 'df' act like Minix. + +check=none Don't do extra checking of bitmaps on mount. +nocheck + +debug Extra debugging information is sent to syslog. + +errors=remount-ro(*) Remount the filesystem read-only on an error. +errors=continue Keep going on a filesystem error. +errors=panic Panic and halt the machine if an error occurs. + +grpid Give objects the same group ID as their creator. +bsdgroups + +nogrpid (*) New objects have the group ID of their creator. +sysvgroups + +resgid=n The group ID which may use the reserved blocks. + +resuid=n The user ID which may use the reserved blocks. + +sb=n Use alternate superblock at this location. + +quota +noquota +grpquota +usrquota + +bh (*) ext4 associates buffer heads to data pages to +nobh (a) cache disk block mapping information + (b) link pages into transaction to provide + ordering guarantees. + "bh" option forces use of buffer heads. + "nobh" option tries to avoid associating buffer + heads (supported only for "writeback" mode). + + +Data Mode +--------- +There are 3 different data modes: + +* writeback mode +In data=writeback mode, ext4 does not journal data at all. This mode provides +a similar level of journaling as that of XFS, JFS, and ReiserFS in its default +mode - metadata journaling. A crash+recovery can cause incorrect data to +appear in files which were written shortly before the crash. This mode will +typically provide the best ext4 performance. + +* ordered mode +In data=ordered mode, ext4 only officially journals metadata, but it logically +groups metadata and data blocks into a single unit called a transaction. When +it's time to write the new metadata out to disk, the associated data blocks +are written first. In general, this mode performs slightly slower than +writeback but significantly faster than journal mode. + +* journal mode +data=journal mode provides full data and metadata journaling. All new data is +written to the journal first, and then to its final location. +In the event of a crash, the journal can be replayed, bringing both data and +metadata into a consistent state. This mode is the slowest except when data +needs to be read from and written to disk at the same time where it +outperforms all others modes. + +References +========== + +kernel source: <file:fs/ext4/> + <file:fs/jbd2/> + +programs: http://e2fsprogs.sourceforge.net/ + http://ext2resize.sourceforge.net + +useful links: http://fedoraproject.org/wiki/ext3-devel + http://www.bullopensource.org/ext4/ -- David Kleikamp IBM Linux Technology Center ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [RFC] [PATCH] Documentation/filesystems/ext4.txt 2006-10-10 20:02 ` [RFC] [PATCH] Documentation/filesystems/ext4.txt Dave Kleikamp @ 2006-10-10 20:56 ` Andrew Morton 2006-10-11 17:03 ` Andreas Dilger 2006-10-12 14:18 ` Valerie Clement 2 siblings, 0 replies; 45+ messages in thread From: Andrew Morton @ 2006-10-10 20:56 UTC (permalink / raw) To: Dave Kleikamp; +Cc: suparna, ext4 development, Theodore Ts'o On Tue, 10 Oct 2006 15:02:35 -0500 Dave Kleikamp <shaggy@austin.ibm.com> wrote: > > > Hopefully not :) > > > We should be able to put something together for a start. Where should this > > > reside ? Under Documentation/filesystems/ext4.txt ? > > Suparna put this together and I updated it a bit. Great, thanks. > > That sounds appropriate. And in the patch changelog. > > How do you want to handle the patch set? I could resend it with more > comments, put it into git, or do you just want to plug the comments into > the patches you are carrying? I can do whatever works best for you. I'll just send what I have now in the next batch, probably this evening. > + - Grab updated e2fsprogs from > + ftp://ftp.kernel.org/pub/linux/kernel/people/tytso/e2fsprogs-interim/ > + This is a patchset on top of e2fsprogs-1.39, which can be found at > + ftp://ftp.kernel.org/pub/linux/kernel/people/tytso/e2fsprogs/ Could we please get a patched tarball up there? The easier we make it for our testers, the more we get. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [RFC] [PATCH] Documentation/filesystems/ext4.txt 2006-10-10 20:02 ` [RFC] [PATCH] Documentation/filesystems/ext4.txt Dave Kleikamp 2006-10-10 20:56 ` Andrew Morton @ 2006-10-11 17:03 ` Andreas Dilger 2006-10-12 14:18 ` Valerie Clement 2 siblings, 0 replies; 45+ messages in thread From: Andreas Dilger @ 2006-10-11 17:03 UTC (permalink / raw) To: Dave Kleikamp; +Cc: Andrew Morton, suparna, ext4 development On Oct 10, 2006 15:02 -0500, Dave Kleikamp wrote: > + - It's still mke2fs -j /dev/hda1 I would suggest "mke2fs -j -O dir_index -I 256 /dev/XXX" to be more representative of what will be used in the future. > +programs: http://e2fsprogs.sourceforge.net/ > + http://ext2resize.sourceforge.net You should likely remove ext2resize from this list, it hasn't got any support for extent-mapped files. Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [RFC] [PATCH] Documentation/filesystems/ext4.txt 2006-10-10 20:02 ` [RFC] [PATCH] Documentation/filesystems/ext4.txt Dave Kleikamp 2006-10-10 20:56 ` Andrew Morton 2006-10-11 17:03 ` Andreas Dilger @ 2006-10-12 14:18 ` Valerie Clement 2 siblings, 0 replies; 45+ messages in thread From: Valerie Clement @ 2006-10-12 14:18 UTC (permalink / raw) To: Dave Kleikamp; +Cc: Andrew Morton, suparna, ext4 development Dave Kleikamp wrote: > +2.2 Candidate features for future inclusion > + > +There are several under discussion, whether they all make it in is > +partly a function of how much time everyone has to work on them: > + > +* improved file allocation (multi-block alloc, delayed alloc; basically done) > +* fix 32000 subdirectory limit (patch exists, needs some e2fsck work) > +* nsec timestamps for mtime, atime, ctime, create time (patch exists, > + needs some e2fsck work) > +* inode version field on disk (NFSv4, Lustre; prototype exists) > +* reduced mke2fs/e2fsck time via uninitialized groups (prototype exists) > +* journal checksumming for robustness, performance (prototype exists) > +* persistent file preallocation (e.g for streaming media, databases) Could you add "support of larger block group size" ? Currently, a prototype exists, but we still have tests to do. Thanks, Valérie - To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: Updated ext4/jbd2 patches based on 2.6.19-rc1 2006-10-05 18:23 Updated ext4/jbd2 patches based on 2.6.19-rc1 Dave Kleikamp ` (2 preceding siblings ...) 2006-10-06 0:39 ` Andrew Morton @ 2006-10-06 3:55 ` Andrew Morton 2006-10-06 3:58 ` Andrew Morton ` (2 more replies) 2006-10-06 4:31 ` Updated ext4/jbd2 patches based on 2.6.19-rc1 Andrew Morton 4 siblings, 3 replies; 45+ messages in thread From: Andrew Morton @ 2006-10-06 3:55 UTC (permalink / raw) To: Dave Kleikamp; +Cc: ext4 development On Thu, 05 Oct 2006 13:23:30 -0500 Dave Kleikamp <shaggy@austin.ibm.com> wrote: > I have rebuilt the ext4/jbd2 patches against linux-2.6.19-rc1. The > patch set is located at > ftp://kernel.org/pub/linux/kernel/people/shaggy/ext4/2.6.19-rc1/ext4-patches-2.6.19-rc1.tar.gz > So let me see if I have this right. You grab Alexandre's kit from http://www.bullopensource.org/ext4/20060926/ and a plain old `mke2fs -j' gives a filesystem which will mount as ext3 or ext4. If you then mount this filesystem with `-t ext4dev -o extents', it becomes incompatible with the ext3 driver. Yes? What else aren't we being told? ;) ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: Updated ext4/jbd2 patches based on 2.6.19-rc1 2006-10-06 3:55 ` Updated ext4/jbd2 patches based on 2.6.19-rc1 Andrew Morton @ 2006-10-06 3:58 ` Andrew Morton 2006-10-06 10:34 ` Alex Tomas 2006-10-06 4:54 ` Randy Dunlap 2006-10-06 12:21 ` Theodore Tso 2 siblings, 1 reply; 45+ messages in thread From: Andrew Morton @ 2006-10-06 3:58 UTC (permalink / raw) To: Dave Kleikamp, ext4 development; +Cc: Alexandre Ratchov On Thu, 5 Oct 2006 20:55:26 -0700 Andrew Morton <akpm@osdl.org> wrote: > You grab Alexandre's kit from http://www.bullopensource.org/ext4/20060926/ > and a plain old `mke2fs -j' gives a filesystem which will mount as ext3 or > ext4. > > If you then mount this filesystem with `-t ext4dev -o extents', it becomes > incompatible with the ext3 driver. Yes? `mke2fs -O extents' doesn't work. Should it? ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: Updated ext4/jbd2 patches based on 2.6.19-rc1 2006-10-06 3:58 ` Andrew Morton @ 2006-10-06 10:34 ` Alex Tomas 0 siblings, 0 replies; 45+ messages in thread From: Alex Tomas @ 2006-10-06 10:34 UTC (permalink / raw) To: Andrew Morton; +Cc: Dave Kleikamp, ext4 development, Alexandre Ratchov >>>>> Andrew Morton (AM) writes: AM> On Thu, 5 Oct 2006 20:55:26 -0700 AM> Andrew Morton <akpm@osdl.org> wrote: >> You grab Alexandre's kit from http://www.bullopensource.org/ext4/20060926/ >> and a plain old `mke2fs -j' gives a filesystem which will mount as ext3 or >> ext4. >> >> If you then mount this filesystem with `-t ext4dev -o extents', it becomes >> incompatible with the ext3 driver. Yes? AM> `mke2fs -O extents' doesn't work. Should it? I believe we keep extents as a mount option for a while, just for development purposes. though there is an agreement about mke2fs -O extents, IIRC. thanks, Alex ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: Updated ext4/jbd2 patches based on 2.6.19-rc1 2006-10-06 3:55 ` Updated ext4/jbd2 patches based on 2.6.19-rc1 Andrew Morton 2006-10-06 3:58 ` Andrew Morton @ 2006-10-06 4:54 ` Randy Dunlap 2006-10-06 5:05 ` Andrew Morton 2006-10-06 5:53 ` Andreas Dilger 2006-10-06 12:21 ` Theodore Tso 2 siblings, 2 replies; 45+ messages in thread From: Randy Dunlap @ 2006-10-06 4:54 UTC (permalink / raw) To: Andrew Morton; +Cc: Dave Kleikamp, ext4 development On Thu, 5 Oct 2006 20:55:26 -0700 Andrew Morton wrote: > On Thu, 05 Oct 2006 13:23:30 -0500 > Dave Kleikamp <shaggy@austin.ibm.com> wrote: > > > I have rebuilt the ext4/jbd2 patches against linux-2.6.19-rc1. The > > patch set is located at > > ftp://kernel.org/pub/linux/kernel/people/shaggy/ext4/2.6.19-rc1/ext4-patches-2.6.19-rc1.tar.gz > > > > So let me see if I have this right. > > You grab Alexandre's kit from http://www.bullopensource.org/ext4/20060926/ > and a plain old `mke2fs -j' gives a filesystem which will mount as ext3 or > ext4. > > If you then mount this filesystem with `-t ext4dev -o extents', it becomes > incompatible with the ext3 driver. Yes? I thought we s/ext4dev/ext4/ ?? > What else aren't we being told? ;) --- ~Randy ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: Updated ext4/jbd2 patches based on 2.6.19-rc1 2006-10-06 4:54 ` Randy Dunlap @ 2006-10-06 5:05 ` Andrew Morton 2006-10-06 5:53 ` Andreas Dilger 1 sibling, 0 replies; 45+ messages in thread From: Andrew Morton @ 2006-10-06 5:05 UTC (permalink / raw) To: Randy Dunlap; +Cc: Dave Kleikamp, ext4 development On Thu, 5 Oct 2006 21:54:42 -0700 Randy Dunlap <rdunlap@xenotime.net> wrote: > I thought we s/ext4dev/ext4/ ?? nope. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: Updated ext4/jbd2 patches based on 2.6.19-rc1 2006-10-06 4:54 ` Randy Dunlap 2006-10-06 5:05 ` Andrew Morton @ 2006-10-06 5:53 ` Andreas Dilger 2006-10-06 6:04 ` Andrew Morton 1 sibling, 1 reply; 45+ messages in thread From: Andreas Dilger @ 2006-10-06 5:53 UTC (permalink / raw) To: Randy Dunlap; +Cc: Andrew Morton, Dave Kleikamp, ext4 development On Oct 05, 2006 21:54 -0700, Randy Dunlap wrote: > On Thu, 5 Oct 2006 20:55:26 -0700 Andrew Morton wrote: > > If you then mount this filesystem with `-t ext4dev -o extents', it becomes > > incompatible with the ext3 driver. Yes? > > I thought we s/ext4dev/ext4/ ?? No, we want to leave it at ext4dev for a while, to make it very clear that this is still under development. We want to get the existing patches upstream so they don't become completely unwieldy, and earlier testing is also good, but it is not yet feature complete. Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: Updated ext4/jbd2 patches based on 2.6.19-rc1 2006-10-06 5:53 ` Andreas Dilger @ 2006-10-06 6:04 ` Andrew Morton 2006-10-06 6:41 ` Andreas Dilger 0 siblings, 1 reply; 45+ messages in thread From: Andrew Morton @ 2006-10-06 6:04 UTC (permalink / raw) To: Andreas Dilger; +Cc: Randy Dunlap, Dave Kleikamp, ext4 development On Thu, 5 Oct 2006 23:53:05 -0600 Andreas Dilger <adilger@clusterfs.com> wrote: > On Oct 05, 2006 21:54 -0700, Randy Dunlap wrote: > > On Thu, 5 Oct 2006 20:55:26 -0700 Andrew Morton wrote: > > > If you then mount this filesystem with `-t ext4dev -o extents', it becomes > > > incompatible with the ext3 driver. Yes? > > > > I thought we s/ext4dev/ext4/ ?? > > No, we want to leave it at ext4dev for a while, to make it very clear > that this is still under development. We want to get the existing > patches upstream so they don't become completely unwieldy, and earlier > testing is also good, but it is not yet feature complete. > What features are missing? Heck, what features does it have now? Guys, we cannot release this thing to the public without telling them what it is, how to use it, where to get the tools from and what the roadmap is. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: Updated ext4/jbd2 patches based on 2.6.19-rc1 2006-10-06 6:04 ` Andrew Morton @ 2006-10-06 6:41 ` Andreas Dilger 2006-10-06 6:50 ` Andrew Morton 2006-10-06 6:52 ` Suparna Bhattacharya 0 siblings, 2 replies; 45+ messages in thread From: Andreas Dilger @ 2006-10-06 6:41 UTC (permalink / raw) To: Andrew Morton; +Cc: Randy Dunlap, Dave Kleikamp, ext4 development On Oct 05, 2006 23:04 -0700, Andrew Morton wrote: > On Thu, 5 Oct 2006 23:53:05 -0600 > Andreas Dilger <adilger@clusterfs.com> wrote: > > No, we want to leave it at ext4dev for a while, to make it very clear > > that this is still under development. We want to get the existing > > patches upstream so they don't become completely unwieldy, and earlier > > testing is also good, but it is not yet feature complete. > > > > What features are missing? There are several under discussion, whether they all make it in is partly a function of how much time everyone has to work on them: - improved file allocation (multi-block alloc, delayed alloc; basically done) - fix 32000 subdirectory limit (patch exists, needs some e2fsck work) - nsec timestamps for mtime, atime, ctime, create time (patch exists, needs some e2fsck work) - inode version field on disk (NFSv4, Lustre; prototype exists) - reduced mke2fs/e2fsck time via uninitialized groups (prototype exists) - journal checksumming for robustness, performance (prototype exists) Features like metadata checksumming have been discussed and planned for a bit but no patches exist yet so I'm not sure they're in the near-term roadmap. > Heck, what features does it have now? Guys, we cannot release this thing > to the public without telling them what it is, how to use it, where to get > the tools from and what the roadmap is. Features now: - ability to use filesystems > 16TB - extent format reduces metadata overhead (RAM, IO for access, transactions) - extent format more robust in face of on-disk corruption due to magics, internal redunancy in tree Features soon (previously available, to be enabled by default by "mkefs.ext4"): - dir_index and resize inode will be on by default - large inodes will be used by default for fast EAs, nsec timestamps, etc Other features as above patches are committed. The big performance win will come with mballoc and delalloc. CFS has been using mballoc for a few years already with Lustre, and IBM + Bull did a lot of benchmarking on it. The reason it isn't in the first set of patches is partly a manageability issue, and partly because it doesn't directly affect the on-disk format (outside of much better allocation) so it isn't critical to get into the first round of changes. I believe Alex is working on a new set of patches right now. Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: Updated ext4/jbd2 patches based on 2.6.19-rc1 2006-10-06 6:41 ` Andreas Dilger @ 2006-10-06 6:50 ` Andrew Morton 2006-10-06 10:31 ` Alex Tomas 2006-10-06 6:52 ` Suparna Bhattacharya 1 sibling, 1 reply; 45+ messages in thread From: Andrew Morton @ 2006-10-06 6:50 UTC (permalink / raw) To: Andreas Dilger; +Cc: Randy Dunlap, Dave Kleikamp, ext4 development On Fri, 6 Oct 2006 00:41:03 -0600 Andreas Dilger <adilger@clusterfs.com> wrote: > The big performance win will come with mballoc and delalloc. CFS has > been using mballoc for a few years already with Lustre, and IBM + Bull > did a lot of benchmarking on it. The reason it isn't in the first set of > patches is partly a manageability issue, and partly because it doesn't > directly affect the on-disk format (outside of much better allocation) > so it isn't critical to get into the first round of changes. I believe > Alex is working on a new set of patches right now. Are you sure that these things will improve allocation much? Reservations made a big improvement there. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: Updated ext4/jbd2 patches based on 2.6.19-rc1 2006-10-06 6:50 ` Andrew Morton @ 2006-10-06 10:31 ` Alex Tomas 2006-10-06 13:57 ` Andrew Morton 0 siblings, 1 reply; 45+ messages in thread From: Alex Tomas @ 2006-10-06 10:31 UTC (permalink / raw) To: Andrew Morton Cc: Andreas Dilger, Randy Dunlap, Dave Kleikamp, ext4 development >>>>> Andrew Morton (AM) writes: AM> On Fri, 6 Oct 2006 00:41:03 -0600 AM> Andreas Dilger <adilger@clusterfs.com> wrote: >> The big performance win will come with mballoc and delalloc. CFS has >> been using mballoc for a few years already with Lustre, and IBM + Bull >> did a lot of benchmarking on it. The reason it isn't in the first set of >> patches is partly a manageability issue, and partly because it doesn't >> directly affect the on-disk format (outside of much better allocation) >> so it isn't critical to get into the first round of changes. I believe >> Alex is working on a new set of patches right now. AM> Are you sure that these things will improve allocation much? Reservations AM> made a big improvement there. it depends on underlaying storage and workload. mballoc uses buddy internally. it's much simpler and cheaper to find free 2^N blocks compared to bitmap. this is especially important for arrays like DDN and raid5/6 because they require stripe-aligned/-sized requests for good throughput. also, last mballoc takes logical block into account and can preallocate few chunks at different logical offsets for a file. imagine torrent downloading different pieces from few peers. thanks, Alex ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: Updated ext4/jbd2 patches based on 2.6.19-rc1 2006-10-06 10:31 ` Alex Tomas @ 2006-10-06 13:57 ` Andrew Morton 2006-10-07 20:09 ` alex 0 siblings, 1 reply; 45+ messages in thread From: Andrew Morton @ 2006-10-06 13:57 UTC (permalink / raw) To: Alex Tomas; +Cc: Andreas Dilger, Randy Dunlap, Dave Kleikamp, ext4 development On Fri, 06 Oct 2006 14:31:59 +0400 Alex Tomas <alex@clusterfs.com> wrote: > >>>>> Andrew Morton (AM) writes: > > AM> On Fri, 6 Oct 2006 00:41:03 -0600 > AM> Andreas Dilger <adilger@clusterfs.com> wrote: > > >> The big performance win will come with mballoc and delalloc. CFS has > >> been using mballoc for a few years already with Lustre, and IBM + Bull > >> did a lot of benchmarking on it. The reason it isn't in the first set of > >> patches is partly a manageability issue, and partly because it doesn't > >> directly affect the on-disk format (outside of much better allocation) > >> so it isn't critical to get into the first round of changes. I believe > >> Alex is working on a new set of patches right now. > > AM> Are you sure that these things will improve allocation much? Reservations > AM> made a big improvement there. > > it depends on underlaying storage and workload. mballoc uses buddy > internally. it's much simpler and cheaper to find free 2^N blocks > compared to bitmap. So mballoc's application is to save CPU cycles? > this is especially important for arrays like > DDN and raid5/6 because they require stripe-aligned/-sized requests > for good throughput. Does this not imply that there needs to be new linkage between the filesystem and the lower layers? So that raid/etc can inform the filesystem driver about its alignment and striping requirements? > also, last mballoc takes logical block into > account and can preallocate few chunks at different logical offsets > for a file. imagine torrent downloading different pieces from few peers. hm. You don't need anything as exotic as bittorrent to show up problems in that area: box:/usr/src/25> sudo bmap vmlinux | wc -l 1152 ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: Updated ext4/jbd2 patches based on 2.6.19-rc1 2006-10-06 13:57 ` Andrew Morton @ 2006-10-07 20:09 ` alex 0 siblings, 0 replies; 45+ messages in thread From: alex @ 2006-10-07 20:09 UTC (permalink / raw) To: Andrew Morton Cc: Andreas Dilger, Randy Dunlap, Dave Kleikamp, ext4 development, alex Hi >>>>> Alex Tomas (AT) writes: >>> it depends on underlaying storage and workload. mballoc uses buddy >>> internally. it's much simpler and cheaper to find free 2^N blocks >>> compared to bitmap. AM> So mballoc's application is to save CPU cycles? AFAIU, we don't implement complex scanning for given size in balloc.c because bitmap isn't very comfortable structure for this and that would require many cycles. with mballoc it becomes possible. for example, to find 1MB free chunk one has to choose group (mballoc tracks number of free chunks in every buddy) and then scan just few bits). thus we can produce better layout and improve performance. >>> this is especially important for arrays like >>> DDN and raid5/6 because they require stripe-aligned/-sized requests >>> for good throughput. AM> Does this not imply that there needs to be new linkage between the AM> filesystem and the lower layers? So that raid/etc can inform the AM> filesystem driver about its alignment and striping requirements? currently, we pass preferred I/O size with mount option (stripe=N). I'd like that sort of communication between block driver and fs. something like f_bsize. >>> also, last mballoc takes logical block into >>> account and can preallocate few chunks at different logical offsets >>> for a file. imagine torrent downloading different pieces from few peers. AM> hm. You don't need anything as exotic as bittorrent to show up problems in AM> that area: AM> box:/usr/src/25> sudo bmap vmlinux | wc -l AM> 1152 well, this can be (and will be, I very hope :) solved by delayed allocation. I mentioned torrent because it's often used to get really large files. so large that they don't fit cache and delayed allocation won't help much. preallocation can help, but then few preallocations per file is required. thanks, Alex ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: Updated ext4/jbd2 patches based on 2.6.19-rc1 2006-10-06 6:41 ` Andreas Dilger 2006-10-06 6:50 ` Andrew Morton @ 2006-10-06 6:52 ` Suparna Bhattacharya 1 sibling, 0 replies; 45+ messages in thread From: Suparna Bhattacharya @ 2006-10-06 6:52 UTC (permalink / raw) To: Andreas Dilger Cc: Andrew Morton, Randy Dunlap, Dave Kleikamp, ext4 development On Fri, Oct 06, 2006 at 12:41:03AM -0600, Andreas Dilger wrote: > On Oct 05, 2006 23:04 -0700, Andrew Morton wrote: > > On Thu, 5 Oct 2006 23:53:05 -0600 > > Andreas Dilger <adilger@clusterfs.com> wrote: > > > No, we want to leave it at ext4dev for a while, to make it very clear > > > that this is still under development. We want to get the existing > > > patches upstream so they don't become completely unwieldy, and earlier > > > testing is also good, but it is not yet feature complete. > > > > > > > What features are missing? > > There are several under discussion, whether they all make it in is > partly a function of how much time everyone has to work on them: > - improved file allocation (multi-block alloc, delayed alloc; basically done) > - fix 32000 subdirectory limit (patch exists, needs some e2fsck work) > - nsec timestamps for mtime, atime, ctime, create time (patch exists, > needs some e2fsck work) > - inode version field on disk (NFSv4, Lustre; prototype exists) > - reduced mke2fs/e2fsck time via uninitialized groups (prototype exists) > - journal checksumming for robustness, performance (prototype exists) > > Features like metadata checksumming have been discussed and planned for > a bit but no patches exist yet so I'm not sure they're in the near-term > roadmap. I would add persistent preallocation (of uninitialized blocks) support to the list. Right now we have only put in support to recognize uninitialized extents so that we can add preallocation, but will be working on developing the actual implementation for persistent preallocation. Regards Suparna > > > Heck, what features does it have now? Guys, we cannot release this thing > > to the public without telling them what it is, how to use it, where to get > > the tools from and what the roadmap is. > > Features now: > - ability to use filesystems > 16TB > - extent format reduces metadata overhead (RAM, IO for access, transactions) > - extent format more robust in face of on-disk corruption due to magics, > internal redunancy in tree > > Features soon (previously available, to be enabled by default by "mkefs.ext4"): > - dir_index and resize inode will be on by default > - large inodes will be used by default for fast EAs, nsec timestamps, etc > > Other features as above patches are committed. > > The big performance win will come with mballoc and delalloc. CFS has > been using mballoc for a few years already with Lustre, and IBM + Bull > did a lot of benchmarking on it. The reason it isn't in the first set of > patches is partly a manageability issue, and partly because it doesn't > directly affect the on-disk format (outside of much better allocation) > so it isn't critical to get into the first round of changes. I believe > Alex is working on a new set of patches right now. > > Cheers, Andreas > -- > Andreas Dilger > Principal Software Engineer > Cluster File Systems, Inc. > > - > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Suparna Bhattacharya (suparna@in.ibm.com) Linux Technology Center IBM Software Lab, India ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: Updated ext4/jbd2 patches based on 2.6.19-rc1 2006-10-06 3:55 ` Updated ext4/jbd2 patches based on 2.6.19-rc1 Andrew Morton 2006-10-06 3:58 ` Andrew Morton 2006-10-06 4:54 ` Randy Dunlap @ 2006-10-06 12:21 ` Theodore Tso 2006-10-06 21:10 ` [PATCH] Get rid of extents mount option Dave Kleikamp 2 siblings, 1 reply; 45+ messages in thread From: Theodore Tso @ 2006-10-06 12:21 UTC (permalink / raw) To: Andrew Morton; +Cc: Dave Kleikamp, ext4 development On Thu, Oct 05, 2006 at 08:55:26PM -0700, Andrew Morton wrote: > On Thu, 05 Oct 2006 13:23:30 -0500 > Dave Kleikamp <shaggy@austin.ibm.com> wrote: > > > I have rebuilt the ext4/jbd2 patches against linux-2.6.19-rc1. The > > patch set is located at > > ftp://kernel.org/pub/linux/kernel/people/shaggy/ext4/2.6.19-rc1/ext4-patches-2.6.19-rc1.tar.gz > > > > So let me see if I have this right. > > You grab Alexandre's kit from http://www.bullopensource.org/ext4/20060926/ > and a plain old `mke2fs -j' gives a filesystem which will mount as ext3 or > ext4. > > If you then mount this filesystem with `-t ext4dev -o extents', it becomes > incompatible with the ext3 driver. Yes? I agree that's the wrong behaviour, and I've always hated the idea of using using mount -o options to enable ext3/4 features. (When do it with EA's and acl's, sigh, and that's wrong too, but at least I was able to paper over that later by adding default mount option support into the superblock.) The right way to do this is to only enable a feature like extents after using tune2fs -O extents, or creating a filesystem with mke2fs -O extents. Can we change the patches to do this, please? - Ted ^ permalink raw reply [flat|nested] 45+ messages in thread
* [PATCH] Get rid of extents mount option 2006-10-06 12:21 ` Theodore Tso @ 2006-10-06 21:10 ` Dave Kleikamp 2006-10-06 21:21 ` [PATCH] Get rid of extents mount option - try 2 Dave Kleikamp 0 siblings, 1 reply; 45+ messages in thread From: Dave Kleikamp @ 2006-10-06 21:10 UTC (permalink / raw) To: Theodore Tso; +Cc: Andrew Morton, ext4 development On Fri, 2006-10-06 at 08:21 -0400, Theodore Tso wrote: > On Thu, Oct 05, 2006 at 08:55:26PM -0700, Andrew Morton wrote: > > So let me see if I have this right. > > > > You grab Alexandre's kit from http://www.bullopensource.org/ext4/20060926/ > > and a plain old `mke2fs -j' gives a filesystem which will mount as ext3 or > > ext4. > > > > If you then mount this filesystem with `-t ext4dev -o extents', it becomes > > incompatible with the ext3 driver. Yes? > > I agree that's the wrong behaviour, and I've always hated the idea of > using using mount -o options to enable ext3/4 features. (When do it > with EA's and acl's, sigh, and that's wrong too, but at least I was > able to paper over that later by adding default mount option support > into the superblock.) > > The right way to do this is to only enable a feature like extents > after using tune2fs -O extents, or creating a filesystem with mke2fs > -O extents. > > Can we change the patches to do this, please? Something like this? EXT4: Get rid of extents mount option Enabling an ext4 file system to use extents should be done with 'tune2fs -O extents' or 'mke2fs -O extents', not with a mount option Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com> diff -Nurp linux-orig/fs/ext4/extents.c linux/fs/ext4/extents.c --- linux-orig/fs/ext4/extents.c 2006-10-05 07:39:08.000000000 -0500 +++ linux/fs/ext4/extents.c 2006-10-06 15:45:59.000000000 -0500 @@ -1875,7 +1875,7 @@ void ext4_ext_init(struct super_block *s * possible initialization would be here */ - if (test_opt(sb, EXTENTS)) { + if (EXT4_HAS_INCOMPAT_FEATURE(sb, EXT4_FEATURE_INCOMPAT_EXTENTS)) { printk("EXT4-fs: file extents enabled"); #ifdef AGRESSIVE_TEST printk(", agressive tests"); @@ -1900,7 +1900,7 @@ void ext4_ext_init(struct super_block *s */ void ext4_ext_release(struct super_block *sb) { - if (!test_opt(sb, EXTENTS)) + if (!EXT4_HAS_INCOMPAT_FEATURE(sb, EXT4_FEATURE_INCOMPAT_EXTENTS)) return; #ifdef EXTENTS_STATS diff -Nurp linux-orig/fs/ext4/ialloc.c linux/fs/ext4/ialloc.c --- linux-orig/fs/ext4/ialloc.c 2006-10-05 07:39:08.000000000 -0500 +++ linux/fs/ext4/ialloc.c 2006-10-06 15:37:36.000000000 -0500 @@ -618,16 +618,9 @@ got: ext4_std_error(sb, err); goto fail_free_drop; } - if (test_opt(sb, EXTENTS)) { + if (EXT4_HAS_INCOMPAT_FEATURE(sb, EXT4_FEATURE_INCOMPAT_EXTENTS)) { EXT4_I(inode)->i_flags |= EXT4_EXTENTS_FL; ext4_ext_tree_init(handle, inode); - if (!EXT4_HAS_INCOMPAT_FEATURE(sb, EXT4_FEATURE_INCOMPAT_EXTENTS)) { - err = ext4_journal_get_write_access(handle, EXT4_SB(sb)->s_sbh); - if (err) goto fail; - EXT4_SET_INCOMPAT_FEATURE(sb, EXT4_FEATURE_INCOMPAT_EXTENTS); - BUFFER_TRACE(EXT4_SB(sb)->s_sbh, "call ext4_journal_dirty_metadata"); - err = ext4_journal_dirty_metadata(handle, EXT4_SB(sb)->s_sbh); - } } ext4_debug("allocating inode %lu\n", inode->i_ino); diff -Nurp linux-orig/fs/ext4/super.c linux/fs/ext4/super.c --- linux-orig/fs/ext4/super.c 2006-10-05 07:39:08.000000000 -0500 +++ linux/fs/ext4/super.c 2006-10-06 15:47:47.000000000 -0500 @@ -728,7 +728,7 @@ enum { Opt_usrjquota, Opt_grpjquota, Opt_offusrjquota, Opt_offgrpjquota, Opt_jqfmt_vfsold, Opt_jqfmt_vfsv0, Opt_quota, Opt_noquota, Opt_ignore, Opt_barrier, Opt_err, Opt_resize, Opt_usrquota, - Opt_grpquota, Opt_extents, + Opt_grpquota, }; static match_table_t tokens = { @@ -778,7 +778,6 @@ static match_table_t tokens = { {Opt_quota, "quota"}, {Opt_usrquota, "usrquota"}, {Opt_barrier, "barrier=%u"}, - {Opt_extents, "extents"}, {Opt_err, NULL}, {Opt_resize, "resize"}, }; @@ -1111,9 +1110,6 @@ clear_qf_name: case Opt_bh: clear_opt(sbi->s_mount_opt, NOBH); break; - case Opt_extents: - set_opt (sbi->s_mount_opt, EXTENTS); - break; default: printk (KERN_ERR "EXT4-fs: Unrecognized mount option \"%s\" " Shaggy -- David Kleikamp IBM Linux Technology Center ^ permalink raw reply [flat|nested] 45+ messages in thread
* [PATCH] Get rid of extents mount option - try 2 2006-10-06 21:10 ` [PATCH] Get rid of extents mount option Dave Kleikamp @ 2006-10-06 21:21 ` Dave Kleikamp 2006-10-06 22:32 ` Andrew Morton 2006-10-11 17:16 ` Andreas Dilger 0 siblings, 2 replies; 45+ messages in thread From: Dave Kleikamp @ 2006-10-06 21:21 UTC (permalink / raw) To: Theodore Tso; +Cc: Andrew Morton, ext4 development I rushed that out too quick. This one cleans up the header files too. EXT4: Get rid of extents mount option Enabling an ext4 file system to use extents should be done with 'tune2fs -O extents' or 'mke2fs -O extents', not with a mount option Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com> diff -Nurp linux-orig/fs/ext4/extents.c linux/fs/ext4/extents.c --- linux-orig/fs/ext4/extents.c 2006-10-05 07:39:08.000000000 -0500 +++ linux/fs/ext4/extents.c 2006-10-06 15:45:59.000000000 -0500 @@ -1875,7 +1875,7 @@ void ext4_ext_init(struct super_block *s * possible initialization would be here */ - if (test_opt(sb, EXTENTS)) { + if (EXT4_HAS_INCOMPAT_FEATURE(sb, EXT4_FEATURE_INCOMPAT_EXTENTS)) { printk("EXT4-fs: file extents enabled"); #ifdef AGRESSIVE_TEST printk(", agressive tests"); @@ -1900,7 +1900,7 @@ void ext4_ext_init(struct super_block *s */ void ext4_ext_release(struct super_block *sb) { - if (!test_opt(sb, EXTENTS)) + if (!EXT4_HAS_INCOMPAT_FEATURE(sb, EXT4_FEATURE_INCOMPAT_EXTENTS)) return; #ifdef EXTENTS_STATS diff -Nurp linux-orig/fs/ext4/ialloc.c linux/fs/ext4/ialloc.c --- linux-orig/fs/ext4/ialloc.c 2006-10-05 07:39:08.000000000 -0500 +++ linux/fs/ext4/ialloc.c 2006-10-06 15:37:36.000000000 -0500 @@ -618,16 +618,9 @@ got: ext4_std_error(sb, err); goto fail_free_drop; } - if (test_opt(sb, EXTENTS)) { + if (EXT4_HAS_INCOMPAT_FEATURE(sb, EXT4_FEATURE_INCOMPAT_EXTENTS)) { EXT4_I(inode)->i_flags |= EXT4_EXTENTS_FL; ext4_ext_tree_init(handle, inode); - if (!EXT4_HAS_INCOMPAT_FEATURE(sb, EXT4_FEATURE_INCOMPAT_EXTENTS)) { - err = ext4_journal_get_write_access(handle, EXT4_SB(sb)->s_sbh); - if (err) goto fail; - EXT4_SET_INCOMPAT_FEATURE(sb, EXT4_FEATURE_INCOMPAT_EXTENTS); - BUFFER_TRACE(EXT4_SB(sb)->s_sbh, "call ext4_journal_dirty_metadata"); - err = ext4_journal_dirty_metadata(handle, EXT4_SB(sb)->s_sbh); - } } ext4_debug("allocating inode %lu\n", inode->i_ino); diff -Nurp linux-orig/fs/ext4/super.c linux/fs/ext4/super.c --- linux-orig/fs/ext4/super.c 2006-10-05 07:39:08.000000000 -0500 +++ linux/fs/ext4/super.c 2006-10-06 15:47:47.000000000 -0500 @@ -728,7 +728,7 @@ enum { Opt_usrjquota, Opt_grpjquota, Opt_offusrjquota, Opt_offgrpjquota, Opt_jqfmt_vfsold, Opt_jqfmt_vfsv0, Opt_quota, Opt_noquota, Opt_ignore, Opt_barrier, Opt_err, Opt_resize, Opt_usrquota, - Opt_grpquota, Opt_extents, + Opt_grpquota, }; static match_table_t tokens = { @@ -778,7 +778,6 @@ static match_table_t tokens = { {Opt_quota, "quota"}, {Opt_usrquota, "usrquota"}, {Opt_barrier, "barrier=%u"}, - {Opt_extents, "extents"}, {Opt_err, NULL}, {Opt_resize, "resize"}, }; @@ -1111,9 +1110,6 @@ clear_qf_name: case Opt_bh: clear_opt(sbi->s_mount_opt, NOBH); break; - case Opt_extents: - set_opt (sbi->s_mount_opt, EXTENTS); - break; default: printk (KERN_ERR "EXT4-fs: Unrecognized mount option \"%s\" " diff -Nurp linux-orig/include/linux/ext4_fs.h linux/include/linux/ext4_fs.h --- linux-orig/include/linux/ext4_fs.h 2006-10-05 07:39:08.000000000 -0500 +++ linux/include/linux/ext4_fs.h 2006-10-06 16:13:07.000000000 -0500 @@ -399,7 +399,6 @@ struct ext4_inode { #define EXT4_MOUNT_QUOTA 0x80000 /* Some quota option set */ #define EXT4_MOUNT_USRQUOTA 0x100000 /* "old" user quota */ #define EXT4_MOUNT_GRPQUOTA 0x200000 /* "old" group quota */ -#define EXT4_MOUNT_EXTENTS 0x400000 /* Extents support */ /* Compatibility, for having both ext2_fs.h and ext4_fs.h included at once */ #ifndef _LINUX_EXT2_FS_H diff -Nurp linux-orig/include/linux/ext4_jbd2.h linux/include/linux/ext4_jbd2.h --- linux-orig/include/linux/ext4_jbd2.h 2006-10-05 07:39:08.000000000 -0500 +++ linux/include/linux/ext4_jbd2.h 2006-10-06 16:17:20.000000000 -0500 @@ -33,7 +33,7 @@ #define EXT4_SINGLEDATA_TRANS_BLOCKS(sb) \ (EXT4_HAS_INCOMPAT_FEATURE(sb, EXT4_FEATURE_INCOMPAT_EXTENTS) \ - || test_opt(sb, EXTENTS) ? 27U : 8U) + ? 27U : 8U) /* Extended attribute operations touch at most two data buffers, * two bitmap buffers, and two group summaries, in addition to the inode -- David Kleikamp IBM Linux Technology Center ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH] Get rid of extents mount option - try 2 2006-10-06 21:21 ` [PATCH] Get rid of extents mount option - try 2 Dave Kleikamp @ 2006-10-06 22:32 ` Andrew Morton 2006-10-06 23:20 ` Dave Kleikamp 2006-10-11 17:16 ` Andreas Dilger 1 sibling, 1 reply; 45+ messages in thread From: Andrew Morton @ 2006-10-06 22:32 UTC (permalink / raw) To: Dave Kleikamp; +Cc: Theodore Tso, ext4 development On Fri, 06 Oct 2006 16:21:40 -0500 Dave Kleikamp <shaggy@austin.ibm.com> wrote: > Enabling an ext4 file system to use extents should be done with > 'tune2fs -O extents' or 'mke2fs -O extents', not with a mount option But the mke2fs which I built by applying http://www.bullopensource.org/ext4/20060926/ to e2fsprogs-1.39 doesn't recognise `-O extents'. So the only way I can use extents is `mount -o extents'. What am I missing here? ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH] Get rid of extents mount option - try 2 2006-10-06 22:32 ` Andrew Morton @ 2006-10-06 23:20 ` Dave Kleikamp 2006-10-07 4:14 ` Theodore Tso 0 siblings, 1 reply; 45+ messages in thread From: Dave Kleikamp @ 2006-10-06 23:20 UTC (permalink / raw) To: Andrew Morton; +Cc: Theodore Tso, ext4 development On Fri, 2006-10-06 at 15:32 -0700, Andrew Morton wrote: > On Fri, 06 Oct 2006 16:21:40 -0500 > Dave Kleikamp <shaggy@austin.ibm.com> wrote: > > > Enabling an ext4 file system to use extents should be done with > > 'tune2fs -O extents' or 'mke2fs -O extents', not with a mount option > > But the mke2fs which I built by applying > http://www.bullopensource.org/ext4/20060926/ to e2fsprogs-1.39 doesn't > recognise `-O extents'. So the only way I can use extents is `mount -o > extents'. > > What am I missing here? To be honest, I've been lazy and I haven't even tried to get the new e2fsprogs. I just grabbed the latest from the mercurial repository, http://e2fsprogs.sourceforge.net/e2fsprogs-hacking.html , and it doesn't work for me either. Ted? Hold off on the patch until we figure it out. :-) Shaggy -- David Kleikamp IBM Linux Technology Center ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH] Get rid of extents mount option - try 2 2006-10-06 23:20 ` Dave Kleikamp @ 2006-10-07 4:14 ` Theodore Tso 2006-10-07 15:53 ` Dave Kleikamp 0 siblings, 1 reply; 45+ messages in thread From: Theodore Tso @ 2006-10-07 4:14 UTC (permalink / raw) To: Dave Kleikamp; +Cc: Andrew Morton, ext4 development On Fri, Oct 06, 2006 at 06:20:00PM -0500, Dave Kleikamp wrote: > To be honest, I've been lazy and I haven't even tried to get the new > e2fsprogs. I just grabbed the latest from the mercurial repository, > http://e2fsprogs.sourceforge.net/e2fsprogs-hacking.html , and it doesn't > work for me either. Ted? > > Hold off on the patch until we figure it out. :-) I've been busy cleaning up the userspace extents patches before I'm willing to accept them into the mainline e2fsprogs tree. So it's not yet in Mercurial yet. It's coming soon; but in the meantime, my interim patchset which I've been using to hack on the extents patches plus signed-char-powerpc-dirhash problem can be found at: ftp://ftp.kernel.org/pub/linux/kernel/people/tytso/e2fsprogs-interim Both a rolled-up patch file plus a broken-out tar.gz file are available there. The current version on the above URL is e2fsprogs-1.39-tyt1. Note that you will have to take the f_extents/image.gz from the broken-out tar file and copy it into tests/f_extents/image.gz or the f_extents regression test will fail. In addition, the f_lotsbad test regression test is also known to fail in 1.39-tyt1, and that regression test failure can be safely ignored for now. This should be good enough for the extents patches that Shaggy has been queuing up. - Ted P.S. Before we add the extents patch, I just thought of one additional change that might be good to add. Could we add an u32 field in the superblock which counts the number of files with extents, and is automatically incremented and decremented as necessary by the kernel, and which can be checked by e2fsck? It would be really useful for making it easy for tune2fs to be able to tell if it can safely remove the extents feature from the filesystem, or whether it should refuse such a request. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH] Get rid of extents mount option - try 2 2006-10-07 4:14 ` Theodore Tso @ 2006-10-07 15:53 ` Dave Kleikamp 2006-10-07 17:20 ` Theodore Tso 0 siblings, 1 reply; 45+ messages in thread From: Dave Kleikamp @ 2006-10-07 15:53 UTC (permalink / raw) To: Theodore Tso; +Cc: Andrew Morton, ext4 development On Sat, 2006-10-07 at 00:14 -0400, Theodore Tso wrote: > On Fri, Oct 06, 2006 at 06:20:00PM -0500, Dave Kleikamp wrote: > > To be honest, I've been lazy and I haven't even tried to get the new > > e2fsprogs. I just grabbed the latest from the mercurial repository, > > http://e2fsprogs.sourceforge.net/e2fsprogs-hacking.html , and it doesn't > > work for me either. Ted? > > > > Hold off on the patch until we figure it out. :-) > > I've been busy cleaning up the userspace extents patches before I'm > willing to accept them into the mainline e2fsprogs tree. So it's not > yet in Mercurial yet. It's coming soon; but in the meantime, my > interim patchset which I've been using to hack on the extents patches > plus signed-char-powerpc-dirhash problem can be found at: > > ftp://ftp.kernel.org/pub/linux/kernel/people/tytso/e2fsprogs-interim > > Both a rolled-up patch file plus a broken-out tar.gz file are > available there. The current version on the above URL is > e2fsprogs-1.39-tyt1. Note that you will have to take the > f_extents/image.gz from the broken-out tar file and copy it into > tests/f_extents/image.gz or the f_extents regression test will fail. > In addition, the f_lotsbad test regression test is also known to fail > in 1.39-tyt1, and that regression test failure can be safely ignored > for now. > > This should be good enough for the extents patches that Shaggy has > been queuing up. I noticed we are missing Documentation/filesystems/ext4.txt. Over the weekend, I'll try to put something together with instructions on getting the right version of e2fsprogs, etc. > P.S. Before we add the extents patch, I just thought of one > additional change that might be good to add. Could we add an u32 > field in the superblock which counts the number of files with extents, > and is automatically incremented and decremented as necessary by the > kernel, and which can be checked by e2fsck? It would be really useful > for making it easy for tune2fs to be able to tell if it can safely > remove the extents feature from the filesystem, or whether it should > refuse such a request. I guess this would be useful to turn the feature off immediately after turning it on, but with the removal of the extents mount option, we no longer have the ability to make old-style files once the feature is turned on. So it's unlikely that you'd be able to turn the feature off once a file system has been used. Also, do we update the superblock in every transaction that creates or deletes a file? Otherwise, how do we guarantee the count is accurate after replaying the journal? Shaggy -- David Kleikamp IBM Linux Technology Center ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH] Get rid of extents mount option - try 2 2006-10-07 15:53 ` Dave Kleikamp @ 2006-10-07 17:20 ` Theodore Tso 2006-10-07 19:45 ` Alex Tomas ` (2 more replies) 0 siblings, 3 replies; 45+ messages in thread From: Theodore Tso @ 2006-10-07 17:20 UTC (permalink / raw) To: Dave Kleikamp; +Cc: Andrew Morton, ext4 development On Sat, Oct 07, 2006 at 10:53:47AM -0500, Dave Kleikamp wrote: > I noticed we are missing Documentation/filesystems/ext4.txt. Over the > weekend, I'll try to put something together with instructions on getting > the right version of e2fsprogs, etc. For now just say to grab the latest from: ftp://ftp.kernel.org/pub/linux/kernel/people/tytso/e2fsprogs-interim We'll have to upgrade it once we have a released version of e2fsprogs that supports extents (although by then we may need e2fsprogs-interim for 48 or 64 bit extents, or whatever next new feature we're working on :-). > I guess this would be useful to turn the feature off immediately after > turning it on, but with the removal of the extents mount option, we no > longer have the ability to make old-style files once the feature is > turned on. So it's unlikely that you'd be able to turn the feature off > once a file system has been used. Well, we could have tune2fs scan all inodes, and have it allocate triple/double/indirect blocks to replace the extent tree, at some point. The count would allow us to turn it off immediately after turning it on without forcing the full scan of all inodes. Maybe it's not worth the overhead though. > Also, do we update the superblock in every transaction that creates or > deletes a file? Otherwise, how do we guarantee the count is accurate > after replaying the journal? Yes, we do. The number of free inodes has to be kept up-to-date, after all, so the superblock is marked dirty and as being part of the transaction. - Ted ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH] Get rid of extents mount option - try 2 2006-10-07 17:20 ` Theodore Tso @ 2006-10-07 19:45 ` Alex Tomas 2006-10-07 19:57 ` Andrew Morton 2006-10-10 18:48 ` Dave Kleikamp 2 siblings, 0 replies; 45+ messages in thread From: Alex Tomas @ 2006-10-07 19:45 UTC (permalink / raw) To: Theodore Tso; +Cc: Dave Kleikamp, Andrew Morton, ext4 development >>>>> Theodore Tso (TT) writes: >> Also, do we update the superblock in every transaction that creates or >> deletes a file? Otherwise, how do we guarantee the count is accurate >> after replaying the journal? TT> Yes, we do. The number of free inodes has to be kept up-to-date, TT> after all, so the superblock is marked dirty and as being part of the TT> transaction. actually, not any more. we use group descriptors to initialize free inodes counter at mount time. thanks, Alex ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH] Get rid of extents mount option - try 2 2006-10-07 17:20 ` Theodore Tso 2006-10-07 19:45 ` Alex Tomas @ 2006-10-07 19:57 ` Andrew Morton 2006-10-10 18:48 ` Dave Kleikamp 2 siblings, 0 replies; 45+ messages in thread From: Andrew Morton @ 2006-10-07 19:57 UTC (permalink / raw) To: Theodore Tso; +Cc: Dave Kleikamp, ext4 development On Sat, 7 Oct 2006 13:20:27 -0400 Theodore Tso <tytso@mit.edu> wrote: > > Also, do we update the superblock in every transaction that creates or > > deletes a file? Otherwise, how do we guarantee the count is accurate > > after replaying the journal? > > Yes, we do. The number of free inodes has to be kept up-to-date, > after all, so the superblock is marked dirty and as being part of the > transaction. Actually we cheat, and we don't keep the superblock free inodes counter up to date in real time. Done for CPU consumptions reasons, but it was perhaps a false optimisation, given that we still have a system-wide inode_lock. The free inode count is already triply redundant: inode table scan, inode bitmap scan, ext4_group_desc.bg_free_inodes_count. Making it quadruply redundant seemed a bit over the top. At runtime the definitive free-inodes count is the sum of the per-blockgroup free-inode counts. On clean shutdown we regenerate that and write it into the superblock. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH] Get rid of extents mount option - try 2 2006-10-07 17:20 ` Theodore Tso 2006-10-07 19:45 ` Alex Tomas 2006-10-07 19:57 ` Andrew Morton @ 2006-10-10 18:48 ` Dave Kleikamp 2006-10-10 21:07 ` Theodore Tso 2 siblings, 1 reply; 45+ messages in thread From: Dave Kleikamp @ 2006-10-10 18:48 UTC (permalink / raw) To: Theodore Tso; +Cc: Andrew Morton, ext4 development On Sat, 2006-10-07 at 13:20 -0400, Theodore Tso wrote: > On Sat, Oct 07, 2006 at 10:53:47AM -0500, Dave Kleikamp wrote: > > I noticed we are missing Documentation/filesystems/ext4.txt. Over the > > weekend, I'll try to put something together with instructions on getting > > the right version of e2fsprogs, etc. > > For now just say to grab the latest from: > > ftp://ftp.kernel.org/pub/linux/kernel/people/tytso/e2fsprogs-interim Ted, Do you think it's possible to put a source tarball out there with the patches applied? It's confusing to untar the tarball only to find the patchset. Otherwise, we'll have to beef up the instructions a little bit. Thanks, Shaggy -- David Kleikamp IBM Linux Technology Center ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH] Get rid of extents mount option - try 2 2006-10-10 18:48 ` Dave Kleikamp @ 2006-10-10 21:07 ` Theodore Tso 2006-10-10 21:18 ` Andrew Morton 0 siblings, 1 reply; 45+ messages in thread From: Theodore Tso @ 2006-10-10 21:07 UTC (permalink / raw) To: Dave Kleikamp; +Cc: Andrew Morton, ext4 development On Tue, Oct 10, 2006 at 01:48:18PM -0500, Dave Kleikamp wrote: > On Sat, 2006-10-07 at 13:20 -0400, Theodore Tso wrote: > > On Sat, Oct 07, 2006 at 10:53:47AM -0500, Dave Kleikamp wrote: > > > I noticed we are missing Documentation/filesystems/ext4.txt. Over the > > > weekend, I'll try to put something together with instructions on getting > > > the right version of e2fsprogs, etc. > > > > For now just say to grab the latest from: > > > > ftp://ftp.kernel.org/pub/linux/kernel/people/tytso/e2fsprogs-interim > > Ted, > Do you think it's possible to put a source tarball out there with the > patches applied? It's confusing to untar the tarball only to find the > patchset. Otherwise, we'll have to beef up the instructions a little > bit. I was assuming that someone who knew out how to deal with -mm patchset would know how to deal with with a patchset. I agree it should be changed to e2fsprogs-1.39-tyt1-broken-out.tar.gz, though. We can put a source tarball up, but sometimes the e2fsprogs-interim patches will be, well, about as stable as the -mm tree. So people who assume a completely stable release may end up being a little disappointed. Hopefully by the time the ext4 stuff gets merged into the mainline kernel we'll have a e2fsprogs-WIP release which will support extents, and then we can tell people to use that.... - - Ted ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH] Get rid of extents mount option - try 2 2006-10-10 21:07 ` Theodore Tso @ 2006-10-10 21:18 ` Andrew Morton 0 siblings, 0 replies; 45+ messages in thread From: Andrew Morton @ 2006-10-10 21:18 UTC (permalink / raw) To: Theodore Tso; +Cc: Dave Kleikamp, ext4 development On Tue, 10 Oct 2006 17:07:18 -0400 Theodore Tso <tytso@mit.edu> wrote: > On Tue, Oct 10, 2006 at 01:48:18PM -0500, Dave Kleikamp wrote: > > On Sat, 2006-10-07 at 13:20 -0400, Theodore Tso wrote: > > > On Sat, Oct 07, 2006 at 10:53:47AM -0500, Dave Kleikamp wrote: > > > > I noticed we are missing Documentation/filesystems/ext4.txt. Over the > > > > weekend, I'll try to put something together with instructions on getting > > > > the right version of e2fsprogs, etc. > > > > > > For now just say to grab the latest from: > > > > > > ftp://ftp.kernel.org/pub/linux/kernel/people/tytso/e2fsprogs-interim > > > > Ted, > > Do you think it's possible to put a source tarball out there with the > > patches applied? It's confusing to untar the tarball only to find the > > patchset. Otherwise, we'll have to beef up the instructions a little > > bit. > > I was assuming that someone who knew out how to deal with -mm patchset > would know how to deal with with a patchset. I agree it should be > changed to e2fsprogs-1.39-tyt1-broken-out.tar.gz, though. > > We can put a source tarball up, but sometimes the e2fsprogs-interim > patches will be, well, about as stable as the -mm tree. So put a stable one up ;) Are people likely to care about e2fsprogs much? They'll just want to do mkfs, maybe the occasional fsck. At this stage it's the kernel code we want people to beat on. > So people who > assume a completely stable release may end up being a little > disappointed. Hopefully by the time the ext4 stuff gets merged into > the mainline kernel we'll have a e2fsprogs-WIP release which will > support extents, and then we can tell people to use that.... I hope you didn't have anything else planned for this evening ;) ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH] Get rid of extents mount option - try 2 2006-10-06 21:21 ` [PATCH] Get rid of extents mount option - try 2 Dave Kleikamp 2006-10-06 22:32 ` Andrew Morton @ 2006-10-11 17:16 ` Andreas Dilger 1 sibling, 0 replies; 45+ messages in thread From: Andreas Dilger @ 2006-10-11 17:16 UTC (permalink / raw) To: Dave Kleikamp; +Cc: Theodore Tso, Andrew Morton, ext4 development On Oct 06, 2006 16:21 -0500, Dave Kleikamp wrote: > EXT4: Get rid of extents mount option > > Enabling an ext4 file system to use extents should be done with > 'tune2fs -O extents' or 'mke2fs -O extents', not with a mount option I would agree that the presence of INCOMPAT_EXTENTS should imply the EXTENTS mount option, but it is also desirable to be able to turn this off for testing. In our internal patches we also have a "noextents" mount option to disable extents at runtime even if "extents" was given as a default mount option. So, I would leave most of the code as-is (with "test_opt(sb, EXTENTS)"), and just have ext3_fill_super() enable EXT4_MOUNT_EXTENTS if INCOMPAT_EXTENTS is set. This is a tiny bit tricky since parse_options() is called before the superblock is read, so I suspect we'll need a separate EXT4_MOUNT_NOEXTENTS to distinguish between no mount "extents" option given and "noextents" disabling this. The Opt_noextents handling would clear EXT4_MOUNT_EXTENTS, and vice versa. Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: Updated ext4/jbd2 patches based on 2.6.19-rc1 2006-10-05 18:23 Updated ext4/jbd2 patches based on 2.6.19-rc1 Dave Kleikamp ` (3 preceding siblings ...) 2006-10-06 3:55 ` Updated ext4/jbd2 patches based on 2.6.19-rc1 Andrew Morton @ 2006-10-06 4:31 ` Andrew Morton 2006-10-06 5:58 ` Andreas Dilger 4 siblings, 1 reply; 45+ messages in thread From: Andrew Morton @ 2006-10-06 4:31 UTC (permalink / raw) To: Dave Kleikamp; +Cc: ext4 development If you mount the filesystem with `-t ext4dev -o extents' then create some extenty files, then unount it and then mount it without `-o extents', the driver will then refuse to create extenty files. IOW: you need to give it `-o extents' each time. That seems fairly pointless. In fact, if I'd created the fs with `mke2fs -O extents' (which doesn't work at present) then I'd expect it to use extents from thereon after, without requiring `mount -o extents'. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: Updated ext4/jbd2 patches based on 2.6.19-rc1 2006-10-06 4:31 ` Updated ext4/jbd2 patches based on 2.6.19-rc1 Andrew Morton @ 2006-10-06 5:58 ` Andreas Dilger 2006-10-06 6:10 ` Andrew Morton 0 siblings, 1 reply; 45+ messages in thread From: Andreas Dilger @ 2006-10-06 5:58 UTC (permalink / raw) To: Andrew Morton; +Cc: Dave Kleikamp, ext4 development On Oct 05, 2006 21:31 -0700, Andrew Morton wrote: > If you mount the filesystem with `-t ext4dev -o extents' then create some > extenty files, then unount it and then mount it without `-o extents', the > driver will then refuse to create extenty files. > > IOW: you need to give it `-o extents' each time. > > That seems fairly pointless. In fact, if I'd created the fs with `mke2fs > -O extents' (which doesn't work at present) then I'd expect it to use > extents from thereon after, without requiring `mount -o extents'. I think this is an oversight. For Lustre we wanted the ability to mount ext3 filesystems with or without extents, because different customers have different levels of tolerance for risk. These days all of our customers use extents (better performance in conjunction with mballoc), but the patches have not been changed for ext4 (which should really default to using extents on a filesystem with the INCOMPAT_EXTENT feature set unless told otherwise). That is a necessity for filesystems larger than 2^32 blocks, since there is no way to create old block-mapped files past that limit. Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: Updated ext4/jbd2 patches based on 2.6.19-rc1 2006-10-06 5:58 ` Andreas Dilger @ 2006-10-06 6:10 ` Andrew Morton 2006-10-06 6:48 ` Andreas Dilger 0 siblings, 1 reply; 45+ messages in thread From: Andrew Morton @ 2006-10-06 6:10 UTC (permalink / raw) To: Andreas Dilger; +Cc: Dave Kleikamp, ext4 development On Thu, 5 Oct 2006 23:58:29 -0600 Andreas Dilger <adilger@clusterfs.com> wrote: > but the patches have not been changed for ext4 (which should really > default to using extents on a filesystem with the INCOMPAT_EXTENT feature > set unless told otherwise). That is a necessity for filesystems larger > than 2^32 blocks, since there is no way to create old block-mapped files > past that limit. That's news to me. So we only use 48-bit block numbers for extents and not for old-style indirect blocks? How much performance improvement do they get, btw? CPU or IO? I'm not noticing any difference. Has been a while since I did any fs testing. Boy, ext3 is beating the crap out of ext2 for quality of file layout. ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: Updated ext4/jbd2 patches based on 2.6.19-rc1 2006-10-06 6:10 ` Andrew Morton @ 2006-10-06 6:48 ` Andreas Dilger 0 siblings, 0 replies; 45+ messages in thread From: Andreas Dilger @ 2006-10-06 6:48 UTC (permalink / raw) To: Andrew Morton; +Cc: Dave Kleikamp, ext4 development On Oct 05, 2006 23:10 -0700, Andrew Morton wrote: > On Thu, 5 Oct 2006 23:58:29 -0600 > Andreas Dilger <adilger@clusterfs.com> wrote: > > but the patches have not been changed for ext4 (which should really > > default to using extents on a filesystem with the INCOMPAT_EXTENT feature > > set unless told otherwise). That is a necessity for filesystems larger > > than 2^32 blocks, since there is no way to create old block-mapped files > > past that limit. > > That's news to me. So we only use 48-bit block numbers for extents and > not for old-style indirect blocks? Correct. The block-mapped {d,t,}indirect blocks chewed up enough space as it was (0.1% of the file size) without doubling the block pointers. Things like truncate hurt pretty badly because of that, as does the increased IO load to read them and memory pressure due to keeping them in RAM. > How much performance improvement do they get, btw? CPU or IO? I'm not > noticing any difference. As mentioned in my other email, the big performance win will come from the multi-block allocation (mballoc) and delayed allocation (delalloc) from Alex. The mballoc patch allows a 1MB write to get a 1MB-aligned and contiguous chunk of disk, instead of the next 256 blocks that might be free. Having 1MB alignment is good for 10% or more on some RAID systems to avoid writing partial stripes (which also requires a read). Delalloc allows the filesystem to actually submit 1MB writes at once without doing the block allocation in prepare_write(). Better for picking free space, and avoids needless extent tree insertion/rebalancing. Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc. ^ permalink raw reply [flat|nested] 45+ messages in thread
end of thread, other threads:[~2006-10-12 14:20 UTC | newest] Thread overview: 45+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2006-10-05 18:23 Updated ext4/jbd2 patches based on 2.6.19-rc1 Dave Kleikamp 2006-10-05 20:19 ` Andrew Morton 2006-10-05 23:25 ` Linus Torvalds 2006-10-06 12:50 ` Dave Kleikamp 2006-10-06 16:11 ` Linus Torvalds 2006-10-05 21:59 ` Andrew Morton 2006-10-06 0:39 ` Andrew Morton 2006-10-10 6:29 ` Andrew Morton 2006-10-10 7:54 ` Suparna Bhattacharya 2006-10-10 8:14 ` Andrew Morton 2006-10-10 20:02 ` [RFC] [PATCH] Documentation/filesystems/ext4.txt Dave Kleikamp 2006-10-10 20:56 ` Andrew Morton 2006-10-11 17:03 ` Andreas Dilger 2006-10-12 14:18 ` Valerie Clement 2006-10-06 3:55 ` Updated ext4/jbd2 patches based on 2.6.19-rc1 Andrew Morton 2006-10-06 3:58 ` Andrew Morton 2006-10-06 10:34 ` Alex Tomas 2006-10-06 4:54 ` Randy Dunlap 2006-10-06 5:05 ` Andrew Morton 2006-10-06 5:53 ` Andreas Dilger 2006-10-06 6:04 ` Andrew Morton 2006-10-06 6:41 ` Andreas Dilger 2006-10-06 6:50 ` Andrew Morton 2006-10-06 10:31 ` Alex Tomas 2006-10-06 13:57 ` Andrew Morton 2006-10-07 20:09 ` alex 2006-10-06 6:52 ` Suparna Bhattacharya 2006-10-06 12:21 ` Theodore Tso 2006-10-06 21:10 ` [PATCH] Get rid of extents mount option Dave Kleikamp 2006-10-06 21:21 ` [PATCH] Get rid of extents mount option - try 2 Dave Kleikamp 2006-10-06 22:32 ` Andrew Morton 2006-10-06 23:20 ` Dave Kleikamp 2006-10-07 4:14 ` Theodore Tso 2006-10-07 15:53 ` Dave Kleikamp 2006-10-07 17:20 ` Theodore Tso 2006-10-07 19:45 ` Alex Tomas 2006-10-07 19:57 ` Andrew Morton 2006-10-10 18:48 ` Dave Kleikamp 2006-10-10 21:07 ` Theodore Tso 2006-10-10 21:18 ` Andrew Morton 2006-10-11 17:16 ` Andreas Dilger 2006-10-06 4:31 ` Updated ext4/jbd2 patches based on 2.6.19-rc1 Andrew Morton 2006-10-06 5:58 ` Andreas Dilger 2006-10-06 6:10 ` Andrew Morton 2006-10-06 6:48 ` Andreas Dilger
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox