Re: [Ext2-devel] [RFC 0/13] extents and 48bit ext3

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: [Ext2-devel] [RFC 0/13] extents and 48bit ext3
       [not found]                                   ` <6nE4Z-4If-55@gated-at.bofh.it>
@ 2006-06-14 16:45                                     ` Bodo Eggert
  2006-06-14 17:28                                       ` Andreas Dilger
  0 siblings, 1 reply; 104+ messages in thread
From: Bodo Eggert @ 2006-06-14 16:45 UTC (permalink / raw)
  To: grundig, jeff, alex, alan, chase.venters, torvalds, adilger, akpm,
	ext2-devel, linux-kernel, cmm, linux-fsdevel

grundig <grundig@teleline.es> wrote:
> Alex Tomas <alex@clusterfs.com> escribió:

>> that's your point of view. mine is that this option (and code)
>> to be used only when needed.
> 
> Distros may ignore your opinion and may enable it, and users won't know
> that it's enabled or even if such feature exist - until they try to run
> an older kernel. If almost nobody needs this feature, why not avoid
> problems by not merging it and maintaining it separated from the
> main tree?

Distros might patch their kernel to support this feature and enable it,
therefore you MUST NOT release new features AT ALL!!!1 - NOT

If a distro decides to enable a non-backward-compatible feature without
warning, it's their fault.

BTW: Upgrading a filesystem by using mount options _and_ forcing that
option to be supplied on subsequent mounts is a BUG. If should be what
current code demands, it should be fixed ASAP. I hope that's not what
the current code does.
-- 
Ich danke GMX dafür, die Verwendung meiner Adressen mittels per SPF
verbreiteten Lügen zu sabotieren.

http://david.woodhou.se/why-not-spf.html
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC 0/13] extents and 48bit ext3
  2006-06-14 16:45                                     ` [Ext2-devel] [RFC 0/13] extents and 48bit ext3 Bodo Eggert
@ 2006-06-14 17:28                                       ` Andreas Dilger
  0 siblings, 0 replies; 104+ messages in thread
From: Andreas Dilger @ 2006-06-14 17:28 UTC (permalink / raw)
  To: 7eggert
  Cc: akpm, jeff, ext2-devel, linux-kernel, chase.venters, torvalds,
	cmm, linux-fsdevel, grundig, alex, alan

On Jun 14, 2006  18:45 +0200, Bodo Eggert wrote:
> BTW: Upgrading a filesystem by using mount options _and_ forcing that
> option to be supplied on subsequent mounts is a BUG. If should be what
> current code demands, it should be fixed ASAP. I hope that's not what
> the current code does.

If you don't remount with "-o extents" all it (currently) means is that
new files will not be created with extents.  Existing extent-mapped files
will continue to work.  It was done this way so that if some serious
problem was found with extents there was a fallback position to "normal"
block mapped files and the damage would be limited to files created while
mounted with "-o extents".

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC 0/13] extents and 48bit ext3
  2006-06-09 18:51                       ` Jeff Garzik
@ 2006-06-09 19:39 Gerrit Huizenga
  2006-06-09 19:45 ` [Ext2-devel] " Jeff Garzik
  2 siblings, 1 reply; 104+ messages in thread
From: Gerrit Huizenga @ 2006-06-09 19:39 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Andrew Morton, Matthew Frost, ext2-devel, linux-kernel,
	Linus Torvalds, cmm, linux-fsdevel, Alex Tomas


On Fri, 09 Jun 2006 14:51:55 EDT, Jeff Garzik wrote:
> 
> PRECISELY.  So you should stop modifying a filesystem whose design is 
> admittedly _not_ modern!

So just how long do you think it would take to get a modern filesystem
into the hands of real users, supported by the distros?  From community
building, through design, development, testing, delivery?

gerrit

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [Ext2-devel] [RFC 0/13] extents and 48bit ext3
  2006-06-09 19:39 Gerrit Huizenga
@ 2006-06-09 19:45 ` Jeff Garzik
  0 siblings, 0 replies; 104+ messages in thread
From: Jeff Garzik @ 2006-06-09 19:45 UTC (permalink / raw)
  To: Gerrit Huizenga
  Cc: Matthew Frost, Alex Tomas, Linus Torvalds, Andrew Morton,
	ext2-devel, linux-kernel, cmm, linux-fsdevel

Gerrit Huizenga wrote:
> On Fri, 09 Jun 2006 14:51:55 EDT, Jeff Garzik wrote:
>> PRECISELY.  So you should stop modifying a filesystem whose design is 
>> admittedly _not_ modern!
> 
> So just how long do you think it would take to get a modern filesystem
> into the hands of real users, supported by the distros?  From community
> building, through design, development, testing, delivery?

Start from a known working point, and keep it working...

	Jeff




^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC 0/13] extents and 48bit ext3
@ 2006-06-09 18:55 Jeff Garzik
  2006-06-09 19:42 ` [Ext2-devel] " Gerrit Huizenga
  0 siblings, 1 reply; 104+ messages in thread
From: Jeff Garzik @ 2006-06-09 18:55 UTC (permalink / raw)
  To: Michael Poole
  Cc: Andrew Morton, ext2-devel, linux-kernel, Christoph Hellwig, cmm,
	linux-fsdevel

Michael Poole wrote:
> Jeff Garzik writes:
> 
>> Andrew Morton wrote:
>>> Ted&co have been pretty good at avoiding compatibility problems.
>> Well, extents and 48bit make that track record demonstrably worse.
>>
>> Users are now forced to remember that, if they write to their
>> filesystem after using either $mmver or $korgver kernels, they are
>> locked out of using older kernels.
> 
> Users are also forced to remember that, if they use certain new
> distros or programs, they are locked out of using older kernels.  They
> are forced to remember that if they have certain newer hardware, they
> are locked out of using older kernels.  They are forced to remember
> that if they use ext3 (or XFS or JFS) _at all_ they are locked out of
> using older kernels.  Why single out this particular aspect of limited
> forward compatibility to harp on so much?

Because it's called backwards compat, when it isn't?
Because it is very difficult to find out which set of kernels you are 
locked out of?
Because the filesystem upgrade is stealthy, occurring as it does on the 
first data write?

	Jeff

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [Ext2-devel] [RFC 0/13] extents and 48bit ext3
  2006-06-09 18:55 Jeff Garzik
@ 2006-06-09 19:42 ` Gerrit Huizenga
  2006-06-09 20:00   ` Jeff Garzik
  0 siblings, 1 reply; 104+ messages in thread
From: Gerrit Huizenga @ 2006-06-09 19:42 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Michael Poole, Andrew Morton, ext2-devel, linux-kernel,
	Christoph Hellwig, cmm, linux-fsdevel

On Fri, 09 Jun 2006 14:55:56 EDT, Jeff Garzik wrote:
> 
> Because it's called backwards compat, when it isn't?
> Because it is very difficult to find out which set of kernels you are 
> locked out of?
> Because the filesystem upgrade is stealthy, occurring as it does on the 
> first data write?

Actually, the *only* point being contended here is running older
kernels on some newer filesystems (created originally with a newer
kernel), right?

Or do you have examples of where current kernels could not deal
with an ext3 feature at some point in time?

I would argue that 0.001% of all Linux *users* actually worry about
this - most of them are right here on the development mailing list.
So, that group is more vocal, for sure.  But, if it works for 99.99+%
users, aren't we still on the good path, from the point of view of
those people who actually *use* Linux the most?

gerrit

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [Ext2-devel] [RFC 0/13] extents and 48bit ext3
  2006-06-09 19:42 ` [Ext2-devel] " Gerrit Huizenga
@ 2006-06-09 20:00   ` Jeff Garzik
  2006-06-09 20:08     ` Alex Tomas
  2006-06-09 20:35     ` Theodore Tso
  0 siblings, 2 replies; 104+ messages in thread
From: Jeff Garzik @ 2006-06-09 20:00 UTC (permalink / raw)
  To: Gerrit Huizenga
  Cc: Michael Poole, Andrew Morton, ext2-devel, linux-kernel,
	Christoph Hellwig, cmm, linux-fsdevel

Gerrit Huizenga wrote:
> On Fri, 09 Jun 2006 14:55:56 EDT, Jeff Garzik wrote:
>> Because it's called backwards compat, when it isn't?
>> Because it is very difficult to find out which set of kernels you are 
>> locked out of?
>> Because the filesystem upgrade is stealthy, occurring as it does on the 
>> first data write?
> 
> Actually, the *only* point being contended here is running older
> kernels on some newer filesystems (created originally with a newer
> kernel), right?
> 
> Or do you have examples of where current kernels could not deal
> with an ext3 feature at some point in time?
> 
> I would argue that 0.001% of all Linux *users* actually worry about
> this - most of them are right here on the development mailing list.
> So, that group is more vocal, for sure.  But, if it works for 99.99+%
> users, aren't we still on the good path, from the point of view of
> those people who actually *use* Linux the most?

The overall objection is to treating ext3 as a highly mutable, 
one-size-fits-all filesystem.

Maybe there is value in moving some reiser4 concepts -- a set of 
metadata+algorithm plugins -- to the VFS level.  I dunno.

But for ext3 specifically, it seems like bolting on extents, 48bit, 
delayed allocation, and other new features weren't really suited for the 
original ext2-style design.  Outside of the support (and marketing, 
because that's all version numbers are in the end) issues already 
mentioned, I think it falls into the nebulous realm of "taste."

Rather than taking another decade to slowly fix ext2 design decisions, 
why not move the process along a bit more rapidly?  Release early, 
release often...

	Jeff

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC 0/13] extents and 48bit ext3
  2006-06-09 20:00   ` Jeff Garzik
@ 2006-06-09 20:08     ` Alex Tomas
  2006-06-09 20:10       ` [Ext2-devel] " Jeff Garzik
  2006-06-09 20:35     ` Theodore Tso
  1 sibling, 1 reply; 104+ messages in thread
From: Alex Tomas @ 2006-06-09 20:08 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Andrew Morton, ext2-devel, linux-kernel, Michael Poole,
	Christoph Hellwig, Gerrit Huizenga, cmm, linux-fsdevel

>>>>> Jeff Garzik (JG) writes:

 JG> Rather than taking another decade to slowly fix ext2 design decisions, 
 JG> why not move the process along a bit more rapidly?  Release early, 
 JG> release often...

that could be true, if we were talking about something yet to be
designed, coded and tested.

thanks, Alex

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [Ext2-devel] [RFC 0/13] extents and 48bit ext3
  2006-06-09 20:08     ` Alex Tomas
@ 2006-06-09 20:10       ` Jeff Garzik
  0 siblings, 0 replies; 104+ messages in thread
From: Jeff Garzik @ 2006-06-09 20:10 UTC (permalink / raw)
  To: Alex Tomas
  Cc: Gerrit Huizenga, Andrew Morton, ext2-devel, linux-kernel,
	Michael Poole, Christoph Hellwig, cmm, linux-fsdevel

Alex Tomas wrote:
>>>>>> Jeff Garzik (JG) writes:
> 
>  JG> Rather than taking another decade to slowly fix ext2 design decisions, 
>  JG> why not move the process along a bit more rapidly?  Release early, 
>  JG> release often...
> 
> that could be true, if we were talking about something yet to be
> designed, coded and tested.

'cp ext3 ext4' already has its first two features:  extents and 48bit. 
And it works today.  Tested to the extent that the submittor has tested it.

	Jeff




^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [Ext2-devel] [RFC 0/13] extents and 48bit ext3
  2006-06-09 20:00   ` Jeff Garzik
  2006-06-09 20:08     ` Alex Tomas
@ 2006-06-09 20:35     ` Theodore Tso
  2006-06-09 21:41       ` Jeff Garzik
  1 sibling, 1 reply; 104+ messages in thread
From: Theodore Tso @ 2006-06-09 20:35 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Gerrit Huizenga, Michael Poole, Andrew Morton, ext2-devel,
	linux-kernel, Christoph Hellwig, cmm, linux-fsdevel

On Fri, Jun 09, 2006 at 04:00:44PM -0400, Jeff Garzik wrote:
> But for ext3 specifically, it seems like bolting on extents, 48bit, 
> delayed allocation, and other new features weren't really suited for the 
> original ext2-style design.  Outside of the support (and marketing, 
> because that's all version numbers are in the end) issues already 
> mentioned, I think it falls into the nebulous realm of "taste."

If is very much a matter of taste, why are you trying to dictate to
the ext2 developers how they choose to do things?  As long as it
works, and we haven't screwed up yet, I'd argue this is falls into the
category of letting each subsystem decide how they best work.  The way
DaveM and the networking team works is quite different from how the
SCSI developers work or the XFS team work --- it's not a
one-size-fits-all sort of thing.

And I'd also dispute with your "weren't really suited for the original
ext2-style design" comment.  Ext2/3 was always designed to be
extensible from the start, and we've successfully added features quite
successfully for quite a while.

> Rather than taking another decade to slowly fix ext2 design decisions, 
> why not move the process along a bit more rapidly?  Release early, 
> release often...

I don't think it will be another decade, but yes, regardless of
whether we do a code fork or not, it will take time.  Basically, you
and the ext2 developers have a disagreement about whether or not a
code fork will actually move the process along more quickly or not.
Either way, we will be releasing early and often, so people can test
it out and comment on it.  Releasing patches to LKML is just the first
step in this process.

						- Ted

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC 0/13] extents and 48bit ext3
  2006-06-09 20:35     ` Theodore Tso
@ 2006-06-09 21:41       ` Jeff Garzik
  2006-06-09 21:45         ` [Ext2-devel] " Michael Poole
  0 siblings, 1 reply; 104+ messages in thread
From: Jeff Garzik @ 2006-06-09 21:41 UTC (permalink / raw)
  To: Theodore Tso
  Cc: Andrew Morton, ext2-devel, linux-kernel, Michael Poole,
	Christoph Hellwig, Gerrit Huizenga, cmm, linux-fsdevel

Theodore Tso wrote:
> And I'd also dispute with your "weren't really suited for the original
> ext2-style design" comment.  Ext2/3 was always designed to be
> extensible from the start, and we've successfully added features quite
> successfully for quite a while.

Although not the only disk format change, extents are a pretty big one. 
Will this be the last major on-disk format change?


>> Rather than taking another decade to slowly fix ext2 design decisions, 
>> why not move the process along a bit more rapidly?  Release early, 
>> release often...
> 
> I don't think it will be another decade, but yes, regardless of
> whether we do a code fork or not, it will take time.  Basically, you
> and the ext2 developers have a disagreement about whether or not a
> code fork will actually move the process along more quickly or not.
> Either way, we will be releasing early and often, so people can test
> it out and comment on it.  Releasing patches to LKML is just the first
> step in this process.

I don't see how a larger filesystem codebase could possibly move more 
quickly than a smaller codebase.  You'd have twice as many code paths to 
worry about.

	Jeff

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [Ext2-devel] [RFC 0/13] extents and 48bit ext3
  2006-06-09 21:41       ` Jeff Garzik
@ 2006-06-09 21:45         ` Michael Poole
  2006-06-09 21:53           ` Jeff Garzik
  0 siblings, 1 reply; 104+ messages in thread
From: Michael Poole @ 2006-06-09 21:45 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Theodore Tso, Gerrit Huizenga, Andrew Morton, ext2-devel,
	linux-kernel, Christoph Hellwig, cmm, linux-fsdevel

Jeff Garzik writes:

> Theodore Tso wrote:
> > And I'd also dispute with your "weren't really suited for the original
> > ext2-style design" comment.  Ext2/3 was always designed to be
> > extensible from the start, and we've successfully added features quite
> > successfully for quite a while.
> 
> Although not the only disk format change, extents are a pretty big
> one. Will this be the last major on-disk format change?

You keep making "straw that broke the camel's back" type arguments
without saying why this particular straw (rather than the other
compatibility-breaking features that are already in ext3) is the one
that must not be allowed.  Is it a matter of taste, or is there some
objective threshold that extents cross?

> >> Rather than taking another decade to slowly fix ext2 design
> >> decisions, why not move the process along a bit more rapidly?
> >> Release early, release often...
> > I don't think it will be another decade, but yes, regardless of
> > whether we do a code fork or not, it will take time.  Basically, you
> > and the ext2 developers have a disagreement about whether or not a
> > code fork will actually move the process along more quickly or not.
> > Either way, we will be releasing early and often, so people can test
> > it out and comment on it.  Releasing patches to LKML is just the first
> > step in this process.
> 
> I don't see how a larger filesystem codebase could possibly move more
> quickly than a smaller codebase.  You'd have twice as many code paths
> to worry about.

This is also the case when you cut and paste an entire filesystem's
source code, as has been mentioned several times in this thread.

Michael Poole

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [Ext2-devel] [RFC 0/13] extents and 48bit ext3
  2006-06-09 21:45         ` [Ext2-devel] " Michael Poole
@ 2006-06-09 21:53           ` Jeff Garzik
  0 siblings, 0 replies; 104+ messages in thread
From: Jeff Garzik @ 2006-06-09 21:53 UTC (permalink / raw)
  To: Michael Poole
  Cc: Theodore Tso, Gerrit Huizenga, Andrew Morton, ext2-devel,
	linux-kernel, Christoph Hellwig, cmm, linux-fsdevel

Michael Poole wrote:
> Jeff Garzik writes:
> 
>> Theodore Tso wrote:
>>> And I'd also dispute with your "weren't really suited for the original
>>> ext2-style design" comment.  Ext2/3 was always designed to be
>>> extensible from the start, and we've successfully added features quite
>>> successfully for quite a while.
>> Although not the only disk format change, extents are a pretty big
>> one. Will this be the last major on-disk format change?
> 
> You keep making "straw that broke the camel's back" type arguments
> without saying why this particular straw (rather than the other
> compatibility-breaking features that are already in ext3) is the one
> that must not be allowed.  Is it a matter of taste, or is there some
> objective threshold that extents cross?

Yes, it's not a small change to the on-disk format.

If you write tools that read an ext3 filesystem, you won't be able to 
read file data at all, without updating your code.

That's a much bigger deal than say 32-bit uids.

	Jeff




^ permalink raw reply	[flat|nested] 104+ messages in thread

* [RFC 0/13] extents and 48bit ext3
@ 2006-06-09  1:20 Mingming Cao
  2006-06-09  2:40 ` Valdis.Kletnieks
                   ` (2 more replies)
  0 siblings, 3 replies; 104+ messages in thread
From: Mingming Cao @ 2006-06-09  1:20 UTC (permalink / raw)
  To: linux-kernel, ext2-devel, linux-fsdevel

Current ext3 filesystem is limited to 8TB(4k block size), this is
practically not enough for the increasing need of bigger storage as
disks in a few years (or even now).

To address this need, there are co-effort from RedHat, ClusterFS, IBM
and BULL to move ext3 from 32 bit filesystem to 48 bit filesystem,
expanding ext3 filesystem limit from 8TB today to 1024 PB. The 48 bit
ext3 is build on top of extent map changes for ext3, originally from
Alex Tomas. In short, the new ext3 on-disk extents format is:

On disk extents format:
/*
  * this is extent on-disk structure
  * it's used at the bottom of the tree
  */
struct ext3_extent {
        __le32  ee_block;       /* first logical block extent covers */
        __le16  ee_len;         /* number of blocks covered by extent */
        __le16  ee_start_hi;    /* high 16 bits of physical block */
        __le32  ee_start;       /* low 32 bigs of physical block */
};

A series of patches have been posted to ext2-devel list in last month
and have been reviewed.  This is updated full series of patches to
support 48 bit ext3 based on extent map. Patches are against 2.6.17-rc6
kernel, and could be found at
http://ext2.sourceforge.net/48bitext3/patches/patches-2.6.17-
rc6-06082006/

[patch 1/13] percpu_counter_longlong.patch
percpu count data type changes to support 64 bit ext3 free blocks count

[patch 2/13] ext3_check_sector_t_overflow.patch
sector_t overflow check for 32bit/48bit ext3 at mount/resize time

[patch 3/13] ext3_fsblk_t_fixes.patch
Define ext3 filesystem and group block types (ext3_fsblk_t,
ext3_grpblk_t, and fix in-kernel ext3 block types (from int type to
ext3_fsblk_t) to support 32bit ext3.

[patch 4/13] ext3_convert_blks_to_fsblk_t.patch
convert the rest of ext3 filesystem blocks to ext3_fsblk_t

patches 1-4 are currently in mm tree

[patch 5/13] sector_fmt.patch
sector_t type format string for all arch.

[patch 6/13] ext3_fsblk_sector_t.patch
support >32bit bit fs block type in kernel (convert ext3_fsblk_t to
sector_t)

[patch 7/13] 64bit_jbd_core.patch
Core 64 bit JBD changes

[patch 8/13] sector_t-jbd.patch
JBD layer in-kernel block variables type fixes to support >32
bit block number and convert to sector_t type.

#extent map patches
[patch 9/13] ext3-extents.patch
core extent map support

[patch 10/13] ext3-extents-48bit.patch
Add full 48 bit physical block support based on extents.

[patch 11/13] ext3-extents-ext3_fsblk_t.patch
convert block types in extents to ext3_fsblk_t

[patch 12/13]ext3_48bit_i_file_acl.pat
48 bit on-disk i_file_acl to support xttar for 48 bit ext3

[patch 13/13] 64bit-metadata
On-disk and in-kernel super block changes to support >32
bit free blocks numbers.

Appreciate any comments and feedbacks!

Mingming

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC 0/13] extents and 48bit ext3
  2006-06-09  1:20 Mingming Cao
@ 2006-06-09  2:40 ` Valdis.Kletnieks
  2006-06-09  8:20   ` Andreas Dilger
  2006-06-09  2:49 ` Jeff Garzik
  2006-06-09  9:13 ` Christoph Hellwig
  2 siblings, 1 reply; 104+ messages in thread
From: Valdis.Kletnieks @ 2006-06-09  2:40 UTC (permalink / raw)
  To: cmm; +Cc: linux-kernel, ext2-devel, linux-fsdevel

[-- Attachment #1: Type: text/plain, Size: 939 bytes --]

On Thu, 08 Jun 2006 18:20:54 PDT, Mingming Cao said:
> Current ext3 filesystem is limited to 8TB(4k block size), this is
> practically not enough for the increasing need of bigger storage as
> disks in a few years (or even now).
> 
> To address this need, there are co-effort from RedHat, ClusterFS, IBM
> and BULL to move ext3 from 32 bit filesystem to 48 bit filesystem,
> expanding ext3 filesystem limit from 8TB today to 1024 PB. The 48 bit
> ext3 is build on top of extent map changes for ext3, originally from
> Alex Tomas. In short, the new ext3 on-disk extents format is:

which implies matching changes to mkfs.ext2 and possibly mount..

> Appreciate any comments and feedbacks!

Somebody else was recently discussing a set of patches to ext3 for
extents+delalloc+mballoc patches - is this work compatible with that?

Also, a pointer to the matching userspace patches would help anybody
who's gung-ho enough to test the code....


[-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --]

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC 0/13] extents and 48bit ext3
  2006-06-09  2:40 ` Valdis.Kletnieks
@ 2006-06-09  8:20   ` Andreas Dilger
  2006-06-09 18:35     ` [Ext2-devel] " Stephen C. Tweedie
  0 siblings, 1 reply; 104+ messages in thread
From: Andreas Dilger @ 2006-06-09  8:20 UTC (permalink / raw)
  To: Valdis.Kletnieks; +Cc: cmm, linux-kernel, ext2-devel, linux-fsdevel

On Jun 08, 2006  22:40 -0400, Valdis.Kletnieks@vt.edu wrote:
> On Thu, 08 Jun 2006 18:20:54 PDT, Mingming Cao said:
> > To address this need, there are co-effort from RedHat, ClusterFS, IBM
> > and BULL to move ext3 from 32 bit filesystem to 48 bit filesystem,
> > expanding ext3 filesystem limit from 8TB today to 1024 PB. The 48 bit
> > ext3 is build on top of extent map changes for ext3, originally from
> > Alex Tomas. In short, the new ext3 on-disk extents format is:
> 
> which implies matching changes to mkfs.ext2 and possibly mount..

The extents format doesn't need any support from mke2fs.  Currently this
is activated by a mount option "-o extents", so it won't be used until
a system administrator actively enables it.

> > Appreciate any comments and feedbacks!
> 
> Somebody else was recently discussing a set of patches to ext3 for
> extents+delalloc+mballoc patches - is this work compatible with that?

Yes, completely compatible (author is the same person).  We have all been
working to get these improvements into the vanilla kernel so that everyone
can benefit from the improved performance.  These patches are just the
start - the mballoc and delalloc patches are follow-on patches, but they
do not affect the on-disk format just the in-memory implementation of
block allocation.

> Also, a pointer to the matching userspace patches would help anybody
> who's gung-ho enough to test the code....

They were posted to the ext2-devel mailing list previously, or you can
download a patched RPM at ftp://ftp.lustre.org/pub/lustre/other/e2fsprogs/
(the extent support is making its way into the official e2fsprogs also).

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [Ext2-devel] [RFC 0/13] extents and 48bit ext3
  2006-06-09  8:20   ` Andreas Dilger
@ 2006-06-09 18:35     ` Stephen C. Tweedie
  2006-06-09 19:20       ` Jeff Garzik
  0 siblings, 1 reply; 104+ messages in thread
From: Stephen C. Tweedie @ 2006-06-09 18:35 UTC (permalink / raw)
  To: Andreas Dilger
  Cc: Valdis.Kletnieks, linux-fsdevel, ext2-devel@lists.sourceforge.net,
	Mingming Cao, linux-kernel, Stephen Tweedie

Hi,

On Fri, 2006-06-09 at 02:20 -0600, Andreas Dilger wrote:

> > which implies matching changes to mkfs.ext2 and possibly mount..
> 
> The extents format doesn't need any support from mke2fs.  Currently this
> is activated by a mount option "-o extents", so it won't be used until
> a system administrator actively enables it.

It does need support from e2fsprogs, though; patches have been posed to
ext2-devel and are available on 

	http://www.bullopensource.org/ext4/index.html

though there is work left to do, especially to improve fsck's ability to
repair partially-damaged extent trees.

--Stephen



^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [Ext2-devel] [RFC 0/13] extents and 48bit ext3
  2006-06-09 18:35     ` [Ext2-devel] " Stephen C. Tweedie
@ 2006-06-09 19:20       ` Jeff Garzik
  2006-06-09 19:28         ` Alex Tomas
  0 siblings, 1 reply; 104+ messages in thread
From: Jeff Garzik @ 2006-06-09 19:20 UTC (permalink / raw)
  To: linux-fsdevel, ext2-devel@lists.sourceforge.net, linux-kernel

Stephen C. Tweedie wrote:
> 	http://www.bullopensource.org/ext4/index.html


heh, some ext3 developers are even calling it ext4 already ;-)

	Jeff



^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [Ext2-devel] [RFC 0/13] extents and 48bit ext3
  2006-06-09 19:20       ` Jeff Garzik
@ 2006-06-09 19:28         ` Alex Tomas
  0 siblings, 0 replies; 104+ messages in thread
From: Alex Tomas @ 2006-06-09 19:28 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: linux-fsdevel, ext2-devel@lists.sourceforge.net, linux-kernel

>>>>> Jeff Garzik (JG) writes:

 JG> Stephen C. Tweedie wrote:
 >> http://www.bullopensource.org/ext4/index.html


 JG> heh, some ext3 developers are even calling it ext4 already ;-)

I bet once you proposed this name few years ago ;)


thanks, Alex

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC 0/13] extents and 48bit ext3
  2006-06-09  1:20 Mingming Cao
  2006-06-09  2:40 ` Valdis.Kletnieks
@ 2006-06-09  2:49 ` Jeff Garzik
  2006-06-09  8:35   ` Andreas Dilger
  2006-06-09  9:13 ` Christoph Hellwig
  2 siblings, 1 reply; 104+ messages in thread
From: Jeff Garzik @ 2006-06-09  2:49 UTC (permalink / raw)
  To: cmm, Andrew Morton, Linus Torvalds
  Cc: linux-kernel, ext2-devel, linux-fsdevel

Mingming Cao wrote:
> Current ext3 filesystem is limited to 8TB(4k block size), this is
> practically not enough for the increasing need of bigger storage as
> disks in a few years (or even now).
> 
> To address this need, there are co-effort from RedHat, ClusterFS, IBM
> and BULL to move ext3 from 32 bit filesystem to 48 bit filesystem,
> expanding ext3 filesystem limit from 8TB today to 1024 PB. The 48 bit
> ext3 is build on top of extent map changes for ext3, originally from
> Alex Tomas. In short, the new ext3 on-disk extents format is:

One of my common complaints about massive ext3 updates such as this is 
the ever-growing "which ext3 filesystem am I mounting?" problem.

I really think extents and 48bit-ness should imply
	cp -a fs/ext3 fs/ext4
and go from there.

IMHO the ext3 back-compat situation is already really hairy, with all 
the features added since the original ext3 release.

The alternative is continual bloating of ext3, and on filesystems, 
inodes which are progressively upgraded -- meaning any use of a prior 
kernel implies that you can only read a subset of your [meta]data, if 
the back-compat code doesn't block the mount entirely.

People (including me) still switch back and forth between ext2 and ext3 
mounts of the same filesystem on occasion.  I think creating an "ext4" 
would allow for greater developer flexibility in implementing new 
features and ditching old ones -- while also emphasizing to the user 
that switching back and forth between ext4 and ext[23] is a bad idea.

Overall, after applying extent (and 48bit) patches, I think it is wrong 
to keep calling it ext3.  That will break some existing user 
assumptions, and continue to restrict developers' freedom to implement 
nifty new features.

	Jeff

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC 0/13] extents and 48bit ext3
  2006-06-09  2:49 ` Jeff Garzik
@ 2006-06-09  8:35   ` Andreas Dilger
  2006-06-09 15:08     ` Jeff Garzik
  0 siblings, 1 reply; 104+ messages in thread
From: Andreas Dilger @ 2006-06-09  8:35 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Andrew Morton, ext2-devel, linux-kernel, Linus Torvalds, cmm,
	linux-fsdevel

On Jun 08, 2006  22:49 -0400, Jeff Garzik wrote:
> One of my common complaints about massive ext3 updates such as this is 
> the ever-growing "which ext3 filesystem am I mounting?" problem.
> 
> I really think extents and 48bit-ness should imply
> 	cp -a fs/ext3 fs/ext4
> and go from there.

The problem with this approach (as seen with ext2 and ext3) is that one
tree or the other gets stale w.r.t. bug fixes and now we have the case
where ext2 has a noticably different implementation in some areas and
bug fixes are no longer trivial to apply to both trees.

I think all of the ext3 maintainers think this split was a bad idea in
hindsight, and having an ext3 mode where it can mount without a journal
would be much more desirable.

> IMHO the ext3 back-compat situation is already really hairy, with all 
> the features added since the original ext3 release.

While partially true, ext2/ext3 has a very good history w.r.t. compatibility
(with one exception being the EAs on symlinks problem that slipped through
with selinux).

Yes, the extents format will be incompatible with older ext3, but it isn't
enabled by default so it will be completely up to the sysadmin when they
make their filesystem incompatible.  They also won't impact any existing
files.  The earlier extents support gets into a kernel.org kernel the
more systems will be able to mount a filesystem with the changes when
they becomes widely used.

All of the other features that are going to be introduced will only going
to be applicable for format time (filesystems larger than 16TB), or if
exceeding limits of the current ext3 support (e.g. files larger than 2TB
in size).

> People (including me) still switch back and forth between ext2 and ext3 
> mounts of the same filesystem on occasion.  I think creating an "ext4" 
> would allow for greater developer flexibility in implementing new 
> features and ditching old ones -- while also emphasizing to the user 
> that switching back and forth between ext4 and ext[23] is a bad idea.

While this is partly true, one of the big benefits is that you can
transparently upgrade your system to use the new features and improve
performance without a long outage window.  Having a completely separate
ext4 filesystem doesn't improve the compatibility story at all.  There
has been renewed discussion on implementing "mounting ext3 without a
journal", just for a recovery mode, because ext2 will not be modified
to get all of these features (running e2fsck on a huge filesystem each
reboot would be insane).

> Overall, after applying extent (and 48bit) patches, I think it is wrong 
> to keep calling it ext3.  That will break some existing user 
> assumptions, and continue to restrict developers' freedom to implement 
> nifty new features.

Just FYI, all of the ext3 developers are on board with this patch series
and it has been discussed and reviewed for many weeks already, it isn't
just being pushed by one party.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC 0/13] extents and 48bit ext3
  2006-06-09  8:35   ` Andreas Dilger
@ 2006-06-09 15:08     ` Jeff Garzik
  2006-06-09 15:25       ` Jeff Garzik
                         ` (2 more replies)
  0 siblings, 3 replies; 104+ messages in thread
From: Jeff Garzik @ 2006-06-09 15:08 UTC (permalink / raw)
  To: Andreas Dilger
  Cc: cmm, Andrew Morton, Linus Torvalds, linux-kernel, ext2-devel,
	linux-fsdevel

Please fix your mailer to stop creating bogus Mail-Followup-To headers, 
headers which exclude the original poster, and cause compliant MUAs to 
incorrectly build To/CC.

Andreas Dilger wrote:
> On Jun 08, 2006  22:49 -0400, Jeff Garzik wrote:
>> One of my common complaints about massive ext3 updates such as this is 
>> the ever-growing "which ext3 filesystem am I mounting?" problem.
>>
>> I really think extents and 48bit-ness should imply
>> 	cp -a fs/ext3 fs/ext4
>> and go from there.
> 
> The problem with this approach (as seen with ext2 and ext3) is that one
> tree or the other gets stale w.r.t. bug fixes and now we have the case
> where ext2 has a noticably different implementation in some areas and
> bug fixes are no longer trivial to apply to both trees.
> 
> I think all of the ext3 maintainers think this split was a bad idea in
> hindsight, and having an ext3 mode where it can mount without a journal
> would be much more desirable.

Please look beyond just ext2/3.  Other filesystems which have "version 
1", "version 2", "version 3", ... formats are all nasty as hell.  The 
end-result bloated code essentially supports several filesystems, all 
within the same code base, and its a nightmare of ugliness.

Further, its not only bloated, but slow.  The code inevitably winds up 
in one of two forms:

	if (spiffy new-feature metadata)
		...
	else if (updated metadata)
		...
	else /* original metadata */
		...

_or_ you add a level of indirection, by creating internal-to-the-fs 
pointer operations.

Stuffing more and more features into fs/ext3 means you are following the 
path that leads to reiser4...  where EVERYTHING under the hood is 
mutable, all within fs/ext3.

>> IMHO the ext3 back-compat situation is already really hairy, with all 
>> the features added since the original ext3 release.
> 
> While partially true, ext2/ext3 has a very good history w.r.t. compatibility
> (with one exception being the EAs on symlinks problem that slipped through
> with selinux).
> 
> Yes, the extents format will be incompatible with older ext3, but it isn't
> enabled by default so it will be completely up to the sysadmin when they
> make their filesystem incompatible.  They also won't impact any existing
> files.  The earlier extents support gets into a kernel.org kernel the
> more systems will be able to mount a filesystem with the changes when
> they becomes widely used.
> 
> All of the other features that are going to be introduced will only going
> to be applicable for format time (filesystems larger than 16TB), or if
> exceeding limits of the current ext3 support (e.g. files larger than 2TB
> in size).

Yet more progressive incompatibility, yet more

	if (metadata v2)
		...
	else /* metadata v1 */
		...

Why do you insist upon calling the end result ext3, when the truth is 
that you are slowing rewriting ext3?

As time progresses, more and more admins must ask themselves the 
question "what flavor of ext3 filesystem is on my hard drive?"

Here's a key question for ext3 developers, which I bet has no answer: 
when is it enough?  Is the plan to continually introduce incompatible 
features into ext3, over time, ad infinitum?

>> People (including me) still switch back and forth between ext2 and ext3 
>> mounts of the same filesystem on occasion.  I think creating an "ext4" 
>> would allow for greater developer flexibility in implementing new 
>> features and ditching old ones -- while also emphasizing to the user 
>> that switching back and forth between ext4 and ext[23] is a bad idea.
> 
> While this is partly true, one of the big benefits is that you can
> transparently upgrade your system to use the new features and improve
> performance without a long outage window.  Having a completely separate

Changing the name to ext4 doesn't erase this capability.

> ext4 filesystem doesn't improve the compatibility story at all.  There
> has been renewed discussion on implementing "mounting ext3 without a
> journal", just for a recovery mode, because ext2 will not be modified
> to get all of these features (running e2fsck on a huge filesystem each
> reboot would be insane).

So now you are going backwards, and implementing ext2-within-ext3?

Are you ready to admit, yet, that ext3 is 100% mutable in the minds of 
ext3 developers?  Why not implement the minix filesystem format within 
ext3, at this point?  We could call it a "plugin", I bet.

>> Overall, after applying extent (and 48bit) patches, I think it is wrong 
>> to keep calling it ext3.  That will break some existing user 
>> assumptions, and continue to restrict developers' freedom to implement 
>> nifty new features.
> 
> Just FYI, all of the ext3 developers are on board with this patch series
> and it has been discussed and reviewed for many weeks already, it isn't
> just being pushed by one party.

That is completely irrelevant to this thread.

If all the ext3 developers are on board, that just implies that there is 
no clear definition of what "ext3" really means.  With this patch 
series, and with future plans described here and elsewhere, the name 
"ext3" will become more and more meaningless.  It could mean _any_ of 
several filesystem metadata variants, and the admin will have no clue 
which variant they are talking to until they try to mount the blkdev 
(and possibly fail the mount).

At SOME point, clueful developers will say "we should better concentrate 
our energy on a new filesystem."

But I see no one at all defining that "some point."

At some point you are beating a dead horse.  At some point, you are 
pushing features into a filesystem that was never designed to support 
said features.

	Jeff

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC 0/13] extents and 48bit ext3
  2006-06-09 15:08     ` Jeff Garzik
@ 2006-06-09 15:25       ` Jeff Garzik
  2006-06-09 15:40         ` Linus Torvalds
  2006-06-09 15:28       ` Alex Tomas
  2006-06-09 20:32       ` Stephen C. Tweedie
  2 siblings, 1 reply; 104+ messages in thread
From: Jeff Garzik @ 2006-06-09 15:25 UTC (permalink / raw)
  To: linux-kernel, ext2-devel, linux-fsdevel
  Cc: Andrew Morton, Linus Torvalds, cmm, Andreas Dilger

Overall, I'm surprised that ext3 developers don't see any of the 
problems related to progressive, stealth filesystem upgrades.

Users are never given a clear indication of when their metadata is being 
upgraded, there is no clear "line of demarcation" they cross, when they 
start using extents.

Since there is no user-visible fs upgrade event, users do not have a 
clear picture of what features are being used -- which means they are 
kept in the dark about which kernels are OK to use on their data.

Do you guys honestly expect users to keep track of which kernels added 
specific ext3 features?

This is why other enterprise filesystems have clear "fs version 1", "fs 
version 2" points across which a user migrates.  ext3's feature-flags 
approach just means that there are a million combinations of potential 
old-and-new features, in-tree and third party, all of which must be 
supported.

	Jeff

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC 0/13] extents and 48bit ext3
  2006-06-09 15:25       ` Jeff Garzik
@ 2006-06-09 15:40         ` Linus Torvalds
  2006-06-09 15:47           ` Jeff Garzik
  2006-06-09 16:10           ` Alex Tomas
  0 siblings, 2 replies; 104+ messages in thread
From: Linus Torvalds @ 2006-06-09 15:40 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: linux-kernel, ext2-devel, linux-fsdevel, Andreas Dilger, cmm,
	Andrew Morton

On Fri, 9 Jun 2006, Jeff Garzik wrote:
>
> Overall, I'm surprised that ext3 developers don't see any of the problems
> related to progressive, stealth filesystem upgrades.

Hey, they're used to it - they've been doing it for a long time.

In fact, ext3 wouldn't be ext3 unless I (and perhaps a few others) had 
insisted on it. People wanted to try to upgrade ext2 in place.

And they've been upgrading it in-place for a long time.

Now, there are unquestionably advantages to that approach too, but as you 
say, there are absolutely tons of disadvantages too. Bugs get much much 
subtler, and more disastrous for old users that don't even want the new 
features.

Quite frankly, at this point, there's no way in hell I believe we can do 
major surgery on ext3. It's the main filesystem for a lot of users, and 
it's just not worth the instability worries unless it's something very 
obviously transparent.

I wouldn't mind an ext4 (that hopefully drops some of the features of 
ext3, and might not downgrade to ext2 on errors, for example).

			Linus

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC 0/13] extents and 48bit ext3
  2006-06-09 15:40         ` Linus Torvalds
@ 2006-06-09 15:47           ` Jeff Garzik
  2006-06-09 15:55             ` Alex Tomas
  2006-06-09 16:10           ` Alex Tomas
  1 sibling, 1 reply; 104+ messages in thread
From: Jeff Garzik @ 2006-06-09 15:47 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-kernel, ext2-devel, linux-fsdevel, Andreas Dilger, cmm,
	Andrew Morton

Linus Torvalds wrote:
> 
> On Fri, 9 Jun 2006, Jeff Garzik wrote:
>> Overall, I'm surprised that ext3 developers don't see any of the problems
>> related to progressive, stealth filesystem upgrades.
> 
> Hey, they're used to it - they've been doing it for a long time.

Agreed, but my argument is that extents are a Big Deal.

think about The Experience:  Suddenly users that could use 2.4.x and 
2.6.x are locked into 2.6.18+, by the simple and common act of writing 
to a file.

No bells and whistles go off...

	Jeff

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC 0/13] extents and 48bit ext3
  2006-06-09 15:47           ` Jeff Garzik
@ 2006-06-09 15:55             ` Alex Tomas
  2006-06-09 15:56               ` Jeff Garzik
  0 siblings, 1 reply; 104+ messages in thread
From: Alex Tomas @ 2006-06-09 15:55 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Andrew Morton, ext2-devel, linux-kernel, Linus Torvalds, cmm,
	linux-fsdevel, Andreas Dilger

>>>>> Jeff Garzik (JG) writes:

 JG> think about The Experience:  Suddenly users that could use 2.4.x and 
 JG> 2.6.x are locked into 2.6.18+, by the simple and common act of writing 
 JG> to a file.

sorry to repeat, but if they simple try 2.6.18, they won't get extents.
instead, they must specify extents mount option. and at this point
they must get clear that this is a way to get incompatible fs.

thanks, Alex

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC 0/13] extents and 48bit ext3
  2006-06-09 15:55             ` Alex Tomas
@ 2006-06-09 15:56               ` Jeff Garzik
  2006-06-09 16:07                 ` Alex Tomas
  2006-06-09 20:52                 ` Stephen C. Tweedie
  0 siblings, 2 replies; 104+ messages in thread
From: Jeff Garzik @ 2006-06-09 15:56 UTC (permalink / raw)
  To: Alex Tomas
  Cc: Andrew Morton, ext2-devel, linux-kernel, Linus Torvalds, cmm,
	linux-fsdevel, Andreas Dilger

Alex Tomas wrote:
>>>>>> Jeff Garzik (JG) writes:
> 
>  JG> think about The Experience:  Suddenly users that could use 2.4.x and 
>  JG> 2.6.x are locked into 2.6.18+, by the simple and common act of writing 
>  JG> to a file.
> 
> sorry to repeat, but if they simple try 2.6.18, they won't get extents.
> instead, they must specify extents mount option. and at this point
> they must get clear that this is a way to get incompatible fs.

Think about how this will be deployed in production, long term.

If extents are not made default at some point, then no one will use the 
feature, and it should not be merged.

And when extents are default, you have this blizzard-of-feature-flags 
stealth upgrade event occur _sometime_ after they boot into the new fs 
for the first time.  And then when they want to boot another kernel, 
they have to dig down a feature matrix, and figure out which ext3 
codebase will work for them.

	Jeff

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC 0/13] extents and 48bit ext3
  2006-06-09 15:56               ` Jeff Garzik
@ 2006-06-09 16:07                 ` Alex Tomas
  2006-06-09 16:09                   ` [Ext2-devel] " Jeff Garzik
  2006-06-09 18:04                   ` Matthew Frost
  2006-06-09 20:52                 ` Stephen C. Tweedie
  1 sibling, 2 replies; 104+ messages in thread
From: Alex Tomas @ 2006-06-09 16:07 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Andrew Morton, ext2-devel, linux-kernel, Linus Torvalds, cmm,
	linux-fsdevel, Alex Tomas, Andreas Dilger

>>>>> Jeff Garzik (JG) writes:

 JG> Think about how this will be deployed in production, long term.

 JG> If extents are not made default at some point, then no one will use
 JG> the feature, and it should not be merged.

sorry, I disagree. for example, NUMA isn't default and shouldn't be.
but we have it in the tree and any one may choose to use it. the same
with extents. let's have it in. but let's make clear it's experimental,
it makes sense for large files only, it isn't backward compatible and
so on.

thanks, Alex

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [Ext2-devel] [RFC 0/13] extents and 48bit ext3
  2006-06-09 16:07                 ` Alex Tomas
@ 2006-06-09 16:09                   ` Jeff Garzik
  2006-06-09 18:04                   ` Matthew Frost
  1 sibling, 0 replies; 104+ messages in thread
From: Jeff Garzik @ 2006-06-09 16:09 UTC (permalink / raw)
  To: Alex Tomas
  Cc: Linus Torvalds, Andrew Morton, ext2-devel, linux-kernel, cmm,
	linux-fsdevel, Andreas Dilger

Alex Tomas wrote:
>>>>>> Jeff Garzik (JG) writes:
> 
>  JG> Think about how this will be deployed in production, long term.
> 
>  JG> If extents are not made default at some point, then no one will use
>  JG> the feature, and it should not be merged.
> 
> sorry, I disagree. for example, NUMA isn't default and shouldn't be.
> but we have it in the tree and any one may choose to use it. the same
> with extents. let's have it in. but let's make clear it's experimental,
> it makes sense for large files only, it isn't backward compatible and
> so on.

NUMA _is_ on by default, in newer hardware kernels :)  K8 is NUMA by 
default, remember.

But anyway...  the "it's experimental" argument is _completely_ 
irrelevant.  You have to think about the day when it is not, and how 
that will get deployed, and what are the potential problems that will 
arise from deployment.

	Jeff




^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [Ext2-devel] [RFC 0/13] extents and 48bit ext3
  2006-06-09 16:07                 ` Alex Tomas
  2006-06-09 16:09                   ` [Ext2-devel] " Jeff Garzik
@ 2006-06-09 18:04                   ` Matthew Frost
  2006-06-09 18:14                     ` Andreas Dilger
  1 sibling, 1 reply; 104+ messages in thread
From: Matthew Frost @ 2006-06-09 18:04 UTC (permalink / raw)
  To: Alex Tomas
  Cc: Jeff Garzik, Linus Torvalds, Andrew Morton, ext2-devel,
	linux-kernel, cmm, linux-fsdevel, Andreas Dilger

Alex Tomas wrote:
>>>>>> Jeff Garzik (JG) writes:
> 
>  JG> Think about how this will be deployed in production, long term.
> 
>  JG> If extents are not made default at some point, then no one will use
>  JG> the feature, and it should not be merged.
> 
> sorry, I disagree. for example, NUMA isn't default and shouldn't be.
> but we have it in the tree and any one may choose to use it.

NUMA is designed to cope with a hardware feature, which not everybody 
has.  Filesystem upgrades are not qualitatively similar; it does not 
depend on one's hardware design as to whether one uses ext3, let alone 
extents.  Your logic is faulty.

  the same
> with extents. let's have it in. but let's make clear it's experimental,
> it makes sense for large files only, it isn't backward compatible and
> so on.
> 
> thanks, Alex
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [Ext2-devel] [RFC 0/13] extents and 48bit ext3
  2006-06-09 18:04                   ` Matthew Frost
@ 2006-06-09 18:14                     ` Andreas Dilger
  2006-06-09 18:51                       ` Jeff Garzik
  0 siblings, 1 reply; 104+ messages in thread
From: Andreas Dilger @ 2006-06-09 18:14 UTC (permalink / raw)
  To: Matthew Frost
  Cc: Alex Tomas, Jeff Garzik, Linus Torvalds, Andrew Morton,
	ext2-devel, linux-kernel, cmm, linux-fsdevel

On Jun 09, 2006  13:04 -0500, Matthew Frost wrote:
> Alex Tomas wrote:
> >sorry, I disagree. for example, NUMA isn't default and shouldn't be.
> >but we have it in the tree and any one may choose to use it.
> 
> NUMA is designed to cope with a hardware feature, which not everybody 
> has.  Filesystem upgrades are not qualitatively similar; it does not 
> depend on one's hardware design as to whether one uses ext3, let alone 
> extents.  Your logic is faulty.

If you have a > 8TB block device (which is common in large RAID devices
today, will be a single disk in a couple of years) then it is important
that your filesystem work with this block device.

If ext2 and ext3 didn't support > 2GB files (which was a filesystem
feature added in exactly the same way as extents are today, and nobody
bitched about it then) then they would be relegated to the same status
as minix and xiafs and all the other filesystems that are stuck in the
"we can't change" or "we aren't supported" camps.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [Ext2-devel] [RFC 0/13] extents and 48bit ext3
  2006-06-09 18:14                     ` Andreas Dilger
@ 2006-06-09 18:51                       ` Jeff Garzik
  2006-06-09 19:49                         ` Theodore Tso
                                           ` (2 more replies)
  0 siblings, 3 replies; 104+ messages in thread
From: Jeff Garzik @ 2006-06-09 18:51 UTC (permalink / raw)
  To: Matthew Frost, Alex Tomas, Jeff Garzik, Linus Torvalds,
	Andrew Morton, ext2-devel, linux-kernel, cmm, linux-fsdevel

Andreas Dilger wrote:
> On Jun 09, 2006  13:04 -0500, Matthew Frost wrote:
>> Alex Tomas wrote:
>>> sorry, I disagree. for example, NUMA isn't default and shouldn't be.
>>> but we have it in the tree and any one may choose to use it.
>> NUMA is designed to cope with a hardware feature, which not everybody 
>> has.  Filesystem upgrades are not qualitatively similar; it does not 
>> depend on one's hardware design as to whether one uses ext3, let alone 
>> extents.  Your logic is faulty.
> 
> If you have a > 8TB block device (which is common in large RAID devices
> today, will be a single disk in a couple of years) then it is important
> that your filesystem work with this block device.
> 
> If ext2 and ext3 didn't support > 2GB files (which was a filesystem
> feature added in exactly the same way as extents are today, and nobody
> bitched about it then) then they would be relegated to the same status
> as minix and xiafs and all the other filesystems that are stuck in the
> "we can't change" or "we aren't supported" camps.

PRECISELY.  So you should stop modifying a filesystem whose design is 
admittedly _not_ modern!

ext3 is already essentially xiafs-on-life-support, when you consider 
today's large storage systems and today's filesystem technology.  Just 
look at the ugly hacks needed to support expanding an ext3 filesystem 
online.

	Jeff




^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [Ext2-devel] [RFC 0/13] extents and 48bit ext3
  2006-06-09 18:51                       ` Jeff Garzik
@ 2006-06-09 19:49                         ` Theodore Tso
  2006-06-09 20:04                           ` Jeff Garzik
  2006-06-11 16:02                         ` Arjan van de Ven
  2006-06-12 22:06                         ` Pavel Machek
  2 siblings, 1 reply; 104+ messages in thread
From: Theodore Tso @ 2006-06-09 19:49 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Matthew Frost, Alex Tomas, Linus Torvalds, Andrew Morton,
	ext2-devel, linux-kernel, cmm, linux-fsdevel

On Fri, Jun 09, 2006 at 02:51:55PM -0400, Jeff Garzik wrote:
> ext3 is already essentially xiafs-on-life-support, when you consider 
> today's large storage systems and today's filesystem technology.  Just 
> look at the ugly hacks needed to support expanding an ext3 filesystem 
> online.

And what ugly hacks are you talking about?  It's actually quite clean;
with the latest e2fsprogs, you use the same command (resize2fs) for
doing both online and offline resizing.

						- Ted

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC 0/13] extents and 48bit ext3
  2006-06-09 19:49                         ` Theodore Tso
@ 2006-06-09 20:04                           ` Jeff Garzik
  2006-06-09 20:57                             ` Stephen C. Tweedie
  2006-06-09 22:37                             ` Andreas Dilger
  0 siblings, 2 replies; 104+ messages in thread
From: Jeff Garzik @ 2006-06-09 20:04 UTC (permalink / raw)
  To: Theodore Tso, Jeff Garzik, Matthew Frost, Alex Tomas,
	Linus Torvalds, Andrew Morton, ext2-devel, linux-kernel, cmm,
	linux-fsdevel

Theodore Tso wrote:
> On Fri, Jun 09, 2006 at 02:51:55PM -0400, Jeff Garzik wrote:
>> ext3 is already essentially xiafs-on-life-support, when you consider 
>> today's large storage systems and today's filesystem technology.  Just 
>> look at the ugly hacks needed to support expanding an ext3 filesystem 
>> online.
> 
> And what ugly hacks are you talking about?  It's actually quite clean;
> with the latest e2fsprogs, you use the same command (resize2fs) for
> doing both online and offline resizing.

Consider a blkdev of size S1.  Using LVM we increase that value under 
the hood to size S2, where S2 > S1.  We perform an online resize from 
size S1 to S2.  The size and alignment of any new groups added will 
different from the non-resize case, where mke2fs was run directly on a 
blkdev of size S2.

	Jeff

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC 0/13] extents and 48bit ext3
  2006-06-09 20:04                           ` Jeff Garzik
@ 2006-06-09 20:57                             ` Stephen C. Tweedie
  2006-06-09 21:49                               ` Jeff Garzik
  2006-06-09 22:37                             ` Andreas Dilger
  1 sibling, 1 reply; 104+ messages in thread
From: Stephen C. Tweedie @ 2006-06-09 20:57 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Andrew Morton, Theodore Ts'o, Matthew Frost, Stephen Tweedie,
	ext2-devel@lists.sourceforge.net, linux-kernel, Linus Torvalds,
	Mingming Cao, linux-fsdevel, Alex Tomas

Hi,

On Fri, 2006-06-09 at 16:04 -0400, Jeff Garzik wrote:

> Consider a blkdev of size S1.  Using LVM we increase that value under 
> the hood to size S2, where S2 > S1.  We perform an online resize from 
> size S1 to S2.  The size and alignment of any new groups added will 
> different from the non-resize case, where mke2fs was run directly on a 
> blkdev of size S2.

No, they won't.  We simply grow the last block group in the filesystem
up to the size where we'd naturally add another block group anyway; and
then, we add another block group exactly where it would have been on a
fresh mkfs.

--Stephen

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC 0/13] extents and 48bit ext3
  2006-06-09 20:57                             ` Stephen C. Tweedie
@ 2006-06-09 21:49                               ` Jeff Garzik
  2006-06-09 21:55                                 ` [Ext2-devel] " Stephen C. Tweedie
  0 siblings, 1 reply; 104+ messages in thread
From: Jeff Garzik @ 2006-06-09 21:49 UTC (permalink / raw)
  To: Stephen C. Tweedie
  Cc: Andrew Morton, Theodore Ts'o, Matthew Frost,
	ext2-devel@lists.sourceforge.net, linux-kernel, Linus Torvalds,
	Mingming Cao, linux-fsdevel, Alex Tomas

Stephen C. Tweedie wrote:
> Hi,
> 
> On Fri, 2006-06-09 at 16:04 -0400, Jeff Garzik wrote:
> 
>> Consider a blkdev of size S1.  Using LVM we increase that value under 
>> the hood to size S2, where S2 > S1.  We perform an online resize from 
>> size S1 to S2.  The size and alignment of any new groups added will 
>> different from the non-resize case, where mke2fs was run directly on a 
>> blkdev of size S2.
> 
> No, they won't.  We simply grow the last block group in the filesystem
> up to the size where we'd naturally add another block group anyway; and
> then, we add another block group exactly where it would have been on a
> fresh mkfs.

Yes but the inodes per group etc. would differ.

	Jeff

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [Ext2-devel] [RFC 0/13] extents and 48bit ext3
  2006-06-09 21:49                               ` Jeff Garzik
@ 2006-06-09 21:55                                 ` Stephen C. Tweedie
  2006-06-09 23:44                                   ` Jeff Garzik
  0 siblings, 1 reply; 104+ messages in thread
From: Stephen C. Tweedie @ 2006-06-09 21:55 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Andrew Morton, Theodore Ts'o, Matthew Frost,
	ext2-devel@lists.sourceforge.net, linux-kernel, Linus Torvalds,
	Mingming Cao, linux-fsdevel, Alex Tomas, Stephen Tweedie

Hi,

On Fri, 2006-06-09 at 17:49 -0400, Jeff Garzik wrote:

> >> Consider a blkdev of size S1.  Using LVM we increase that value under 
> >> the hood to size S2, where S2 > S1.  We perform an online resize from 
> >> size S1 to S2.  The size and alignment of any new groups added will 
> >> different from the non-resize case, where mke2fs was run directly on a 
> >> blkdev of size S2.
> > 
> > No, they won't.  We simply grow the last block group in the filesystem
> > up to the size where we'd naturally add another block group anyway; and
> > then, we add another block group exactly where it would have been on a
> > fresh mkfs.
> 
> Yes but the inodes per group etc. would differ.

No, we add the same number of inodes in the new groups that all the
previous groups have.

--Stephen



^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC 0/13] extents and 48bit ext3
  2006-06-09 21:55                                 ` [Ext2-devel] " Stephen C. Tweedie
@ 2006-06-09 23:44                                   ` Jeff Garzik
  2006-06-10  0:45                                     ` [Ext2-devel] " Andreas Dilger
  2006-06-10  0:47                                     ` Theodore Tso
  0 siblings, 2 replies; 104+ messages in thread
From: Jeff Garzik @ 2006-06-09 23:44 UTC (permalink / raw)
  To: Stephen C. Tweedie
  Cc: Andrew Morton, Theodore Ts'o, Matthew Frost,
	ext2-devel@lists.sourceforge.net, linux-kernel, Linus Torvalds,
	Mingming Cao, linux-fsdevel, Alex Tomas

Stephen C. Tweedie wrote:
> Hi,
> 
> On Fri, 2006-06-09 at 17:49 -0400, Jeff Garzik wrote:
> 
>>>> Consider a blkdev of size S1.  Using LVM we increase that value under 
>>>> the hood to size S2, where S2 > S1.  We perform an online resize from 
>>>> size S1 to S2.  The size and alignment of any new groups added will 
>>>> different from the non-resize case, where mke2fs was run directly on a 
>>>> blkdev of size S2.
>>> No, they won't.  We simply grow the last block group in the filesystem
>>> up to the size where we'd naturally add another block group anyway; and
>>> then, we add another block group exactly where it would have been on a
>>> fresh mkfs.
>> Yes but the inodes per group etc. would differ.
> 
> No, we add the same number of inodes in the new groups that all the
> previous groups have.

Yes.  Re-read what I wrote.  To put it another way, "mkfs S1 + resize to 
S2" does not produce precisely the same layout as "mkfs S2".

	Jeff

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [Ext2-devel] [RFC 0/13] extents and 48bit ext3
  2006-06-09 23:44                                   ` Jeff Garzik
@ 2006-06-10  0:45                                     ` Andreas Dilger
  2006-06-10  0:47                                     ` Theodore Tso
  1 sibling, 0 replies; 104+ messages in thread
From: Andreas Dilger @ 2006-06-10  0:45 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Stephen C. Tweedie, Andrew Morton, Theodore Ts'o,
	Matthew Frost, ext2-devel@lists.sourceforge.net, linux-kernel,
	Linus Torvalds, Mingming Cao, linux-fsdevel, Alex Tomas

On Jun 09, 2006  19:44 -0400, Jeff Garzik wrote:
> Stephen C. Tweedie wrote:
> > No, we add the same number of inodes in the new groups that all the
> > previous groups have.
> 
> Yes.  Re-read what I wrote.  To put it another way, "mkfs S1 + resize to 
> S2" does not produce precisely the same layout as "mkfs S2".

And in what way is that important?  I mean, really, if this is your argument
that ext3 online resizing is a "hack" then it is pretty weak.  This does
not affect the operation or compatibility of the resized filesystem all the
way back to the stone age (i.e. every single ext2 kernel ever will work
with the resized filesystem).  That is why online resizing (and the resize
inode) are a COMPAT feature.

If I "cp b a /mnt/newfs" and "cp a b /mnt/newfs" "a" and "b" will have
different inode numbers too, but doesn't mean that "cp" is a "hack".

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC 0/13] extents and 48bit ext3
  2006-06-09 23:44                                   ` Jeff Garzik
  2006-06-10  0:45                                     ` [Ext2-devel] " Andreas Dilger
@ 2006-06-10  0:47                                     ` Theodore Tso
  2006-06-10  1:09                                       ` Jeff Garzik
  1 sibling, 1 reply; 104+ messages in thread
From: Theodore Tso @ 2006-06-10  0:47 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Andrew Morton, Matthew Frost, Stephen C. Tweedie,
	ext2-devel@lists.sourceforge.net, linux-kernel, Linus Torvalds,
	Mingming Cao, linux-fsdevel, Alex Tomas

On Fri, Jun 09, 2006 at 07:44:44PM -0400, Jeff Garzik wrote:
> Yes.  Re-read what I wrote.  To put it another way, "mkfs S1 + resize to 
> S2" does not produce precisely the same layout as "mkfs S2".

Different in the same way that "mke2fs -E stride=5" results a slightly
different location of where the block bitmaps, inode bitmaps, and
inode table might be, yes --- but SO WHAT?  

There's a *reason* that the block group descriptors tell the kernel
where to find the block/inode bitmaps and the inode table.  They can
change due to bad blocks in the filesystem, or requests to subtly
change the layout to optimize various RAID layouts, for example.  And
exactly how the block/inode bitmaps would get laid out in response to
-E stride have also changed over time, depending on which version of
e2fsprogs, but ---- News flash!! --- it doesn't matter!!!

Jeff, you seem to think that the fact that the layout isn't precisely
the same after an on-line resizing is proof of something horrible, but
it isn't.  The exact location of filesystem metadata has never been
fixed, not in the past ten years of ext2/3 history, and this is not a
big deal.  It certainly isn't "proof" of on-line resizing being
something horrible, as you keep trying to claim, without any arguments
other than, "The layout is different!".  

Oh my, hide the women and children...

							- Ted

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC 0/13] extents and 48bit ext3
  2006-06-10  0:47                                     ` Theodore Tso
@ 2006-06-10  1:09                                       ` Jeff Garzik
  2006-06-10  1:30                                         ` [Ext2-devel] " Andreas Dilger
  0 siblings, 1 reply; 104+ messages in thread
From: Jeff Garzik @ 2006-06-10  1:09 UTC (permalink / raw)
  To: Theodore Tso, Jeff Garzik, Stephen C. Tweedie, Andrew Morton,
	Matthew Frost, ext2-devel@lists.sourceforge.net, linux-kernel,
	Linus Torvalds, Mingming Cao, linux-fsdevel, Alex Tomas

Theodore Tso wrote:
> Jeff, you seem to think that the fact that the layout isn't precisely
> the same after an on-line resizing is proof of something horrible, but
> it isn't.  The exact location of filesystem metadata has never been
> fixed, not in the past ten years of ext2/3 history, and this is not a
> big deal.  It certainly isn't "proof" of on-line resizing being
> something horrible, as you keep trying to claim, without any arguments
> other than, "The layout is different!".  

No, I was proving merely that it is _different_.  And the values where 
you see a _difference_ are the ones of which are no longer sized 
optimally, after you grow the fs to a larger size.

So you incur a performance penalty for resizing to size S2, rather than 
mke2fs'ing the new blkdev at size S2.  Certainly within the confines of 
ext3 that cannot be helped, but a different inode allocation strategy 
could improve upon that.

	Jeff

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [Ext2-devel] [RFC 0/13] extents and 48bit ext3
  2006-06-10  1:09                                       ` Jeff Garzik
@ 2006-06-10  1:30                                         ` Andreas Dilger
  2006-06-10  1:43                                           ` Jeff Garzik
  0 siblings, 1 reply; 104+ messages in thread
From: Andreas Dilger @ 2006-06-10  1:30 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Theodore Tso, Stephen C. Tweedie, Andrew Morton, Matthew Frost,
	ext2-devel@lists.sourceforge.net, linux-kernel, Linus Torvalds,
	Mingming Cao, linux-fsdevel, Alex Tomas

On Jun 09, 2006  21:09 -0400, Jeff Garzik wrote:
> Theodore Tso wrote:
> > Jeff, you seem to think that the fact that the layout isn't precisely
> > the same after an on-line resizing is proof of something horrible, but
> > it isn't.  The exact location of filesystem metadata has never been
> > fixed, not in the past ten years of ext2/3 history, and this is not a
> > big deal.  It certainly isn't "proof" of on-line resizing being
> > something horrible, as you keep trying to claim, without any arguments
> > other than, "The layout is different!".  
> 
> No, I was proving merely that it is _different_.  And the values where 
> you see a _difference_ are the ones of which are no longer sized 
> optimally, after you grow the fs to a larger size.

It sounds like you don't know what you are talking about, which is OK,
except that you keep harping on some non-existent point.

> So you incur a performance penalty for resizing to size S2, rather than 
> mke2fs'ing the new blkdev at size S2.  Certainly within the confines of 
> ext3 that cannot be helped, but a different inode allocation strategy 
> could improve upon that.

???  Can you please be specific in what the performance penalty is, and
what specifically is "not sized optimally" after a resize?  How exactly
does inode allocation strategy relate to anything at all to online resizing.

Given that Ted and I are both disagreeing with you, and we are the two
people who know the most about the online resizing code (SCT is also
in this same group), maybe you should just concede that you are incorrect
on this point and move on.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [Ext2-devel] [RFC 0/13] extents and 48bit ext3
  2006-06-10  1:30                                         ` [Ext2-devel] " Andreas Dilger
@ 2006-06-10  1:43                                           ` Jeff Garzik
  2006-06-10  2:03                                             ` Theodore Tso
  0 siblings, 1 reply; 104+ messages in thread
From: Jeff Garzik @ 2006-06-10  1:43 UTC (permalink / raw)
  To: Jeff Garzik, Theodore Tso, Stephen C. Tweedie, Andrew Morton,
	Matthew Frost, ext2-devel@lists.sourceforge.net, linux-kernel,
	Linus Torvalds, Mingming Cao, linux-fsdevel, Alex Tomas

Andreas Dilger wrote:
> On Jun 09, 2006  21:09 -0400, Jeff Garzik wrote:
>> Theodore Tso wrote:
>>> Jeff, you seem to think that the fact that the layout isn't precisely
>>> the same after an on-line resizing is proof of something horrible, but
>>> it isn't.  The exact location of filesystem metadata has never been
>>> fixed, not in the past ten years of ext2/3 history, and this is not a
>>> big deal.  It certainly isn't "proof" of on-line resizing being
>>> something horrible, as you keep trying to claim, without any arguments
>>> other than, "The layout is different!".  
>> No, I was proving merely that it is _different_.  And the values where 
>> you see a _difference_ are the ones of which are no longer sized 
>> optimally, after you grow the fs to a larger size.
> 
> It sounds like you don't know what you are talking about, which is OK,
> except that you keep harping on some non-existent point.
> 
>> So you incur a performance penalty for resizing to size S2, rather than 
>> mke2fs'ing the new blkdev at size S2.  Certainly within the confines of 
>> ext3 that cannot be helped, but a different inode allocation strategy 
>> could improve upon that.
> 
> ???  Can you please be specific in what the performance penalty is, and
> what specifically is "not sized optimally" after a resize?  How exactly
> does inode allocation strategy relate to anything at all to online resizing.

Inodes per group / inode blocks per group, as I've already stated.

	Jeff



^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC 0/13] extents and 48bit ext3
  2006-06-10  1:43                                           ` Jeff Garzik
@ 2006-06-10  2:03                                             ` Theodore Tso
  2006-06-10  2:11                                               ` [Ext2-devel] " Jeff Garzik
  2006-06-10  2:58                                               ` Jeff Garzik
  0 siblings, 2 replies; 104+ messages in thread
From: Theodore Tso @ 2006-06-10  2:03 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Andrew Morton, Matthew Frost, Stephen C. Tweedie,
	ext2-devel@lists.sourceforge.net, linux-kernel, Linus Torvalds,
	Mingming Cao, linux-fsdevel, Alex Tomas

On Fri, Jun 09, 2006 at 09:43:14PM -0400, Jeff Garzik wrote:
> >???  Can you please be specific in what the performance penalty is, and
> >what specifically is "not sized optimally" after a resize?  How exactly
> >does inode allocation strategy relate to anything at all to online 
> >resizing.
> 
> Inodes per group / inode blocks per group, as I've already stated.

Nope!  Inodes per group and inode blocks per group are maintained
across an online resize.  So there is no difference in inodes per
group for a filesystem created at size S1 and resized to size S2
(using either an on-line or off-line resize), and a filesystem which
is created to be size S2.

As Andreas has said, "you don't know what you are talking about."

						- Ted

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [Ext2-devel] [RFC 0/13] extents and 48bit ext3
  2006-06-10  2:03                                             ` Theodore Tso
@ 2006-06-10  2:11                                               ` Jeff Garzik
  2006-06-10  2:58                                               ` Jeff Garzik
  1 sibling, 0 replies; 104+ messages in thread
From: Jeff Garzik @ 2006-06-10  2:11 UTC (permalink / raw)
  To: Theodore Tso, Jeff Garzik, Stephen C. Tweedie, Andrew Morton,
	Matthew Frost, ext2-devel@lists.sourceforge.net, linux-kernel,
	Linus Torvalds, Mingming Cao, linux-fsdevel, Alex Tomas

Theodore Tso wrote:
> On Fri, Jun 09, 2006 at 09:43:14PM -0400, Jeff Garzik wrote:
>>> ???  Can you please be specific in what the performance penalty is, and
>>> what specifically is "not sized optimally" after a resize?  How exactly
>>> does inode allocation strategy relate to anything at all to online 
>>> resizing.
>> Inodes per group / inode blocks per group, as I've already stated.
> 
> Inodes per group and inode blocks per group are maintained
> across an online resize.

That's the problem I'm pointing out.


> So there is no difference in inodes per
> group for a filesystem created at size S1 and resized to size S2
> (using either an on-line or off-line resize), and a filesystem which
> is created to be size S2.

Trivial to prove false, by your statement above if nothing else.  But 
anyway:
Run mke2fs on a blkdev of size 500MB, and one of 500GB.  Note values.
Now resize blkdev formatted for size 500MB to 500GB, and note differences.

	Jeff




^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [Ext2-devel] [RFC 0/13] extents and 48bit ext3
  2006-06-10  2:03                                             ` Theodore Tso
  2006-06-10  2:11                                               ` [Ext2-devel] " Jeff Garzik
@ 2006-06-10  2:58                                               ` Jeff Garzik
  1 sibling, 0 replies; 104+ messages in thread
From: Jeff Garzik @ 2006-06-10  2:58 UTC (permalink / raw)
  To: Theodore Tso, Andreas Dilger, Stephen C. Tweedie
  Cc: Andrew Morton, Matthew Frost, ext2-devel@lists.sourceforge.net,
	linux-kernel, Linus Torvalds, Mingming Cao, linux-fsdevel,
	Alex Tomas

Theodore Tso wrote:
> Inodes per group and inode blocks per group are maintained
> across an online resize.  So there is no difference in inodes per
> group for a filesystem created at size S1 and resized to size S2
> (using either an on-line or off-line resize), and a filesystem which
> is created to be size S2.


Here are real numbers, which illustrate how the above two statements 
contradict, and how the second statement is false:

blkdev A, formatted with a 50MB filesystem
	block size		4096
	block count		12800 (size S1)
	inodes per group	12800
blkdev A, formatted to full capacity (~350GB)
	block size		4096
	block count		95472256 (size S2)
	inodes per group	32768

Case 1:	online resize from 50MB to 350GB
Result:	inodes per group == 12800 (it remains the same)

Case 2: mke2fs blkdev A, with no block-count restrictions
Result:	inodes per group == 32768

Thus, each inode group holds fewer inodes per group in case #1 than #2.
Thus, case #2 has greater inode density than case #1.

Overall,
a) mke2fs chooses optimal values based on creation-time block count
b) online resize does not change these values

thus the values are no longer optimal.  And in this case, they are never 
-more- optimal, and potentially -less- optimal.

	Jeff




^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [Ext2-devel] [RFC 0/13] extents and 48bit ext3
  2006-06-09 20:04                           ` Jeff Garzik
  2006-06-09 20:57                             ` Stephen C. Tweedie
@ 2006-06-09 22:37                             ` Andreas Dilger
  1 sibling, 0 replies; 104+ messages in thread
From: Andreas Dilger @ 2006-06-09 22:37 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Theodore Tso, Matthew Frost, Alex Tomas, Linus Torvalds,
	Andrew Morton, ext2-devel, linux-kernel, cmm, linux-fsdevel

On Jun 09, 2006  16:04 -0400, Jeff Garzik wrote:
> Theodore Tso wrote:
> > And what ugly hacks are you talking about?  It's actually quite clean;
> > with the latest e2fsprogs, you use the same command (resize2fs) for
> > doing both online and offline resizing.
> 
> Consider a blkdev of size S1.  Using LVM we increase that value under 
> the hood to size S2, where S2 > S1.  We perform an online resize from 
> size S1 to S2.  The size and alignment of any new groups added will 
> different from the non-resize case, where mke2fs was run directly on a 
> blkdev of size S2.

Umm, and how is that a problem?  Either you want online resizing because
it provides some useful functionality, or you don't want it because you
are concerned with something that nobody else in the world is.  In the
latter case, don't use it.  Even if the metadata alignment is slightly
different on disk doesn't make it in any way an invalid filesystem.  In
fact, online resizing is 100% compatible after the resize back to the
dark ages of linux.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [Ext2-devel] [RFC 0/13] extents and 48bit ext3
  2006-06-09 18:51                       ` Jeff Garzik
  2006-06-09 19:49                         ` Theodore Tso
@ 2006-06-11 16:02                         ` Arjan van de Ven
  2006-06-11 16:30                           ` Nikita Danilov
  2006-06-12 22:06                         ` Pavel Machek
  2 siblings, 1 reply; 104+ messages in thread
From: Arjan van de Ven @ 2006-06-11 16:02 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Matthew Frost, Alex Tomas, Linus Torvalds, Andrew Morton,
	ext2-devel, linux-kernel, cmm, linux-fsdevel

On Fri, 2006-06-09 at 14:51 -0400, Jeff Garzik wrote:
> PRECISELY.  So you should stop modifying a filesystem whose design is 
> admittedly _not_ modern!
> 
> ext3 is already essentially xiafs-on-life-support, when you consider 
> today's large storage systems and today's filesystem technology.  Just 
> look at the ugly hacks needed to support expanding an ext3 filesystem 
> online.

actually I think I disagree with you. One thing I've noticed over the
years is that ext2 layout has one thing going for it: it is simple and
robust. Maybe "ext2 layout" is the wrong word, "block bitmap and
direct/indirect block based" may be better. It seems that once you go
into tree space (and I would call htree a borderline thing there) you
get both really complex code and fragile behavior all over (mostly in
terms of "when something goes wrong")

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC 0/13] extents and 48bit ext3
  2006-06-11 16:02                         ` Arjan van de Ven
@ 2006-06-11 16:30                           ` Nikita Danilov
  2006-06-11 16:55                             ` [Ext2-devel] " Arjan van de Ven
  0 siblings, 1 reply; 104+ messages in thread
From: Nikita Danilov @ 2006-06-11 16:30 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Andrew Morton, Matthew Frost, ext2-devel, linux-kernel,
	Linus Torvalds, cmm, linux-fsdevel, Alex Tomas

Arjan van de Ven writes:
 > On Fri, 2006-06-09 at 14:51 -0400, Jeff Garzik wrote:
 > > PRECISELY.  So you should stop modifying a filesystem whose design is 
 > > admittedly _not_ modern!
 > > 
 > > ext3 is already essentially xiafs-on-life-support, when you consider 
 > > today's large storage systems and today's filesystem technology.  Just 
 > > look at the ugly hacks needed to support expanding an ext3 filesystem 
 > > online.
 > 
 > 
 > actually I think I disagree with you. One thing I've noticed over the
 > years is that ext2 layout has one thing going for it: it is simple and
 > robust. Maybe "ext2 layout" is the wrong word, "block bitmap and
 > direct/indirect block based" may be better. It seems that once you go
 > into tree space (and I would call htree a borderline thing there) you
 > get both really complex code and fragile behavior all over (mostly in
 > terms of "when something goes wrong")

Huh? Direct/indirect/double-indirect/... _is_ a tree, albeit not
balanced one. What makes s5fs/ffs/ufs/ext* so exceptionally robust is
fixed position of inode tables, which provides a guaranteed starting
point for fsck under almost any circumstances.

Nikita.

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [Ext2-devel] [RFC 0/13] extents and 48bit ext3
  2006-06-11 16:30                           ` Nikita Danilov
@ 2006-06-11 16:55                             ` Arjan van de Ven
  0 siblings, 0 replies; 104+ messages in thread
From: Arjan van de Ven @ 2006-06-11 16:55 UTC (permalink / raw)
  To: Nikita Danilov
  Cc: Matthew Frost, Alex Tomas, Linus Torvalds, Andrew Morton,
	ext2-devel, linux-kernel, cmm, linux-fsdevel

On Sun, 2006-06-11 at 20:30 +0400, Nikita Danilov wrote:
> Arjan van de Ven writes:
>  > On Fri, 2006-06-09 at 14:51 -0400, Jeff Garzik wrote:
>  > > PRECISELY.  So you should stop modifying a filesystem whose design is 
>  > > admittedly _not_ modern!
>  > > 
>  > > ext3 is already essentially xiafs-on-life-support, when you consider 
>  > > today's large storage systems and today's filesystem technology.  Just 
>  > > look at the ugly hacks needed to support expanding an ext3 filesystem 
>  > > online.
>  > 
>  > 
>  > actually I think I disagree with you. One thing I've noticed over the
>  > years is that ext2 layout has one thing going for it: it is simple and
>  > robust. Maybe "ext2 layout" is the wrong word, "block bitmap and
>  > direct/indirect block based" may be better. It seems that once you go
>  > into tree space (and I would call htree a borderline thing there) you
>  > get both really complex code and fragile behavior all over (mostly in
>  > terms of "when something goes wrong")
> 
> Huh? Direct/indirect/double-indirect/... _is_ a tree, albeit not
> balanced one.

ok sure; the main strength is that it is not a dynamic tree.



^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [Ext2-devel] [RFC 0/13] extents and 48bit ext3
  2006-06-09 18:51                       ` Jeff Garzik
  2006-06-09 19:49                         ` Theodore Tso
  2006-06-11 16:02                         ` Arjan van de Ven
@ 2006-06-12 22:06                         ` Pavel Machek
  2006-06-14 14:31                           ` Barry K. Nathan
  2 siblings, 1 reply; 104+ messages in thread
From: Pavel Machek @ 2006-06-12 22:06 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Matthew Frost, Alex Tomas, Linus Torvalds, Andrew Morton,
	ext2-devel, linux-kernel, cmm, linux-fsdevel

Hi!

> >If ext2 and ext3 didn't support > 2GB files (which was 
> >a filesystem
> >feature added in exactly the same way as extents are 
> >today, and nobody
> >bitched about it then) then they would be relegated to 
> >the same status
> >as minix and xiafs and all the other filesystems that 
> >are stuck in the
> >"we can't change" or "we aren't supported" camps.
> 
> PRECISELY.  So you should stop modifying a filesystem 
> whose design is admittedly _not_ modern!
> 
> ext3 is already essentially xiafs-on-life-support, when 
> you consider today's large storage systems and today's 
> filesystem technology. 

Please don't. AFAIK, ext2/3 is only filesystem with working fsck
(because that fsck was actually needed in the old days). Starting from
xfs/jfs/reiser/??? means we no longer have working fsck...

-- 
Thanks for all the (sleeping) penguins.

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC 0/13] extents and 48bit ext3
  2006-06-12 22:06                         ` Pavel Machek
@ 2006-06-14 14:31                           ` Barry K. Nathan
  2006-06-14 21:34                             ` [Ext2-devel] " Pavel Machek
  0 siblings, 1 reply; 104+ messages in thread
From: Barry K. Nathan @ 2006-06-14 14:31 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Andrew Morton, Matthew Frost, Jeff Garzik, ext2-devel,
	linux-kernel, Linus Torvalds, cmm, linux-fsdevel, Alex Tomas

On 6/12/06, Pavel Machek <pavel@ucw.cz> wrote:
> Please don't. AFAIK, ext2/3 is only filesystem with working fsck
> (because that fsck was actually needed in the old days). Starting from
> xfs/jfs/reiser/??? means we no longer have working fsck...

Er, what do you mean by "working fsck"?

Unless I'm misunderstanding something, JFS also has a working fsck
(which has actually performed successful repair of real-world
filesystem corruption for me, although I haven't used it as much as
e2fsck or xfs_repair).

XFS's fsck is a no-op, but I think it could be implemented as a
wrapper around xfs_repair (and maybe xfs_check). xfs_repair has
successfully fixed corrupted filesystems for me, just as JFS's fsck
has.

(As for ReiserFS... well, in the past it's probably been too easy to
shoot yourself in the foot with reiserfsck and make the filesystem
worse-to-nonexistent instead of better. I haven't needed to use
reiserfsck on a corrupt FS lately so I don't know how it compares
these days.)
-- 
-Barry K. Nathan <barryn@pobox.com>

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [Ext2-devel] [RFC 0/13] extents and 48bit ext3
  2006-06-14 14:31                           ` Barry K. Nathan
@ 2006-06-14 21:34                             ` Pavel Machek
  2006-06-15  0:28                               ` Barry K. Nathan
  0 siblings, 1 reply; 104+ messages in thread
From: Pavel Machek @ 2006-06-14 21:34 UTC (permalink / raw)
  To: Barry K. Nathan
  Cc: Jeff Garzik, Matthew Frost, Alex Tomas, Linus Torvalds,
	Andrew Morton, ext2-devel, linux-kernel, cmm, linux-fsdevel

Hi!

> >Please don't. AFAIK, ext2/3 is only filesystem with 
> >working fsck
> >(because that fsck was actually needed in the old 
> >days). Starting from
> >xfs/jfs/reiser/??? means we no longer have working 
> >fsck...
> 
> Er, what do you mean by "working fsck"?

Passes 8 hours of me trying to intentionally break it with weird,
artifical disk corruption.

I even have script somewhere.

> Unless I'm misunderstanding something, JFS also has a 
> working fsck
> (which has actually performed successful repair of 
> real-world
> filesystem corruption for me, although I haven't used it 
> as much as
> e2fsck or xfs_repair).

...like, if it repaired 100 different, non-trivial corruptions, that
would be argument.

fsck.ext2 survives my torture (in some versions). fsck.vfat never
worked for me (likes to segfault), fsck.reiser never worked for me.

							Pavel
-- 
Thanks for all the (sleeping) penguins.

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [Ext2-devel] [RFC 0/13] extents and 48bit ext3
  2006-06-14 21:34                             ` [Ext2-devel] " Pavel Machek
@ 2006-06-15  0:28                               ` Barry K. Nathan
  2006-06-15  4:55                                 ` Theodore Tso
  2006-06-15  9:15                                 ` Pavel Machek
  0 siblings, 2 replies; 104+ messages in thread
From: Barry K. Nathan @ 2006-06-15  0:28 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Jeff Garzik, Matthew Frost, Alex Tomas, Linus Torvalds,
	Andrew Morton, ext2-devel, linux-kernel, cmm, linux-fsdevel

On 6/14/06, Pavel Machek <pavel@ucw.cz> wrote:
> Passes 8 hours of me trying to intentionally break it with weird,
> artifical disk corruption.
>
> I even have script somewhere.

Ok, thanks for clarifying.

> > Unless I'm misunderstanding something, JFS also has a
> > working fsck
> > (which has actually performed successful repair of
> > real-world
> > filesystem corruption for me, although I haven't used it
> > as much as
> > e2fsck or xfs_repair).
>
> ...like, if it repaired 100 different, non-trivial corruptions, that
> would be argument.

In the case of XFS, I've repaired maybe two dozen (or so) corruptions
that might be non-trivial (in most of the cases, the filesystem
wouldn't even mount before the repair).

> fsck.ext2 survives my torture (in some versions). fsck.vfat never
> worked for me (likes to segfault), fsck.reiser never worked for me.

BTW, I actually have a test filesystem here (an e2image from an actual
filesystem I encountered once) that used to cause e2fsck 1.36/1.37 to
segfault. Strangely, more ancient versions (like what ships in Red Hat
7.2) were able to repair it without segfaulting. In a few days, once
other stuff calms down for me, I need to revisit that and see if the
bug still exists with 1.39.
-- 
-Barry K. Nathan <barryn@pobox.com>

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [Ext2-devel] [RFC 0/13] extents and 48bit ext3
  2006-06-15  0:28                               ` Barry K. Nathan
@ 2006-06-15  4:55                                 ` Theodore Tso
  2006-06-15  7:43                                   ` Barry K. Nathan
  2006-06-15  9:15                                 ` Pavel Machek
  1 sibling, 1 reply; 104+ messages in thread
From: Theodore Tso @ 2006-06-15  4:55 UTC (permalink / raw)
  To: Barry K. Nathan; +Cc: ext2-devel, linux-kernel, linux-fsdevel

On Wed, Jun 14, 2006 at 05:28:31PM -0700, Barry K. Nathan wrote:
> BTW, I actually have a test filesystem here (an e2image from an actual
> filesystem I encountered once) that used to cause e2fsck 1.36/1.37 to
> segfault. Strangely, more ancient versions (like what ships in Red Hat
> 7.2) were able to repair it without segfaulting. In a few days, once
> other stuff calms down for me, I need to revisit that and see if the
> bug still exists with 1.39.

Please try it with 1.39; if it still crashes, let me know --- I treat
any filesystem corruptions that causes e2fsck to crash or which e2fsck
can't fix in a single pass to be a bug.  I'm guessing though that this
was probably this bug which was fixed right after 1.38 released (some
distributions did have the fix, but it's in the mainline e2fsprogs
starting with 1.39):

2005-07-04  Theodore Ts'o  <tytso@mit.edu>

	* pass2.c (e2fsck_process_bad_inode): Fixed bug which could cause
		e2fsck to core dump if a disconnected inode contained an
		extended attribute.  This was actually caused by two bugs.
		The first bug is that if the inode has been fully fixed
		up, the code will attempt to remove the inode from the
		inode_bad_map without checking to see if this bitmap is
		present.  Since it is cleared at the end of pass 2, if
		e2fsck_process_bad_inode is called in pass 4 (as it is for
		disconnected inodes), this would result in a core dump.
		This bug was mostly hidden by a second bug, which caused
		e2fsck_process_bad_inode() to consider all inodes without
		an extended attribute to be not fixed.  (Addresses Debian
		Bug: #316736)

						- Ted

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [Ext2-devel] [RFC 0/13] extents and 48bit ext3
  2006-06-15  4:55                                 ` Theodore Tso
@ 2006-06-15  7:43                                   ` Barry K. Nathan
  0 siblings, 0 replies; 104+ messages in thread
From: Barry K. Nathan @ 2006-06-15  7:43 UTC (permalink / raw)
  To: Theodore Tso, Barry K. Nathan, ext2-devel, linux-kernel,
	linux-fsdevel

On 6/14/06, Theodore Tso <tytso@mit.edu> wrote:
> Please try it with 1.39; if it still crashes, let me know --- I treat
[snip]

1.39 fixes it. Cool!

However, http://e2fsprogs.sourceforge.net/ is still touting the "NEW"
e2fsprogs 1.38 release. I think it would be a good idea to update the
page...
-- 
-Barry K. Nathan <barryn@pobox.com>

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC 0/13] extents and 48bit ext3
  2006-06-15  0:28                               ` Barry K. Nathan
  2006-06-15  4:55                                 ` Theodore Tso
@ 2006-06-15  9:15                                 ` Pavel Machek
  2006-06-15  9:40                                   ` Barry K. Nathan
  1 sibling, 1 reply; 104+ messages in thread
From: Pavel Machek @ 2006-06-15  9:15 UTC (permalink / raw)
  To: Barry K. Nathan
  Cc: Andrew Morton, Matthew Frost, Jeff Garzik, ext2-devel,
	linux-kernel, Linus Torvalds, cmm, linux-fsdevel, Alex Tomas

Hi!

> >Passes 8 hours of me trying to intentionally break it with weird,
> >artifical disk corruption.
> >
> >I even have script somewhere.
> 
> Ok, thanks for clarifying.

You can get a copy, it would be interesting to know how JFS/XFS does.

> >> Unless I'm misunderstanding something, JFS also has a
> >> working fsck
> >> (which has actually performed successful repair of
> >> real-world
> >> filesystem corruption for me, although I haven't used it
> >> as much as
> >> e2fsck or xfs_repair).
> >
> >...like, if it repaired 100 different, non-trivial corruptions, that
> >would be argument.
> 
> In the case of XFS, I've repaired maybe two dozen (or so) corruptions
> that might be non-trivial (in most of the cases, the filesystem
> wouldn't even mount before the repair).
> 
> >fsck.ext2 survives my torture (in some versions). fsck.vfat never
> >worked for me (likes to segfault), fsck.reiser never worked for me.
> 
> BTW, I actually have a test filesystem here (an e2image from an actual
> filesystem I encountered once) that used to cause e2fsck 1.36/1.37 to
> segfault. Strangely, more ancient versions (like what ships in Red Hat
> 7.2) were able to repair it without segfaulting. In a few days, once
> other stuff calms down for me, I need to revisit that and see if the
> bug still exists with 1.39.

It varies a bit bitween versions, but at least e2fsck has regression
test suite... I had nasty e2 corruption in past (suspend wrote 0 onto
strategic place in bitmaps) where it put filesystem in self-destruct
mode. e2fsck reported fixing the corruption, but did not really fix
it... e2fsck was fixed in the meantime.

(I also have way to corrupt ext2 in a way that basically can't be
repaired automatically. Deallocating free block bitmap and putting
data in freed space is an evil way to corrupt filesystem).
								Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC 0/13] extents and 48bit ext3
  2006-06-15  9:15                                 ` Pavel Machek
@ 2006-06-15  9:40                                   ` Barry K. Nathan
  2006-06-15  9:50                                     ` [Ext2-devel] " Pavel Machek
  0 siblings, 1 reply; 104+ messages in thread
From: Barry K. Nathan @ 2006-06-15  9:40 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Andrew Morton, Matthew Frost, Jeff Garzik, ext2-devel,
	linux-kernel, Linus Torvalds, cmm, linux-fsdevel, Alex Tomas

On 6/15/06, Pavel Machek <pavel@suse.cz> wrote:
> Hi!
>
> > >Passes 8 hours of me trying to intentionally break it with weird,
> > >artifical disk corruption.
> > >
> > >I even have script somewhere.
> >
> > Ok, thanks for clarifying.
>
> You can get a copy, it would be interesting to know how JFS/XFS does.

Ok, I would be interested in getting a copy. (Maybe it would be good
to post it in public so that other people can try it too.)
-- 
-Barry K. Nathan <barryn@pobox.com>

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [Ext2-devel] [RFC 0/13] extents and 48bit ext3
  2006-06-15  9:40                                   ` Barry K. Nathan
@ 2006-06-15  9:50                                     ` Pavel Machek
  0 siblings, 0 replies; 104+ messages in thread
From: Pavel Machek @ 2006-06-15  9:50 UTC (permalink / raw)
  To: Barry K. Nathan
  Cc: Jeff Garzik, Matthew Frost, Alex Tomas, Linus Torvalds,
	Andrew Morton, ext2-devel, linux-kernel, cmm, linux-fsdevel

Hi!

> >> >Passes 8 hours of me trying to intentionally break it with weird,
> >> >artifical disk corruption.
> >> >
> >> >I even have script somewhere.
> >>
> >> Ok, thanks for clarifying.
> >
> >You can get a copy, it would be interesting to know how JFS/XFS does.
> 
> Ok, I would be interested in getting a copy. (Maybe it would be good
> to post it in public so that other people can try it too.)

It needs some hand-tuning to do maximum damage to the filesystem, yet
keeping filesystem "recognizable". It also depends on fsck returning
reasonable error codes...
								Pavel

#!/bin/bash
#
# fscktest
#
# Usage: 
#	 Make sure output is logged somewhere
#        First, run fscktest -p as root
#	 Then you can run fscktest as normal user...
#

prepare() {
	SIZE=100000
	echo "Creating file..."
	cat /dev/zero | head -c $[$SIZE*1024] > test
	echo "Making filesystem..."
	mkfs.$FS test
	echo "Mounting..."
	mount test -o loop /mnt || exit "Cant mount"
	echo "Copying files..."
	cp -a /bin /mnt
	cp -a /usr/bin /mnt
	cp -a /usr/src/linux /mnt
	echo "Syncing..."
	sync
	echo "Unmounting..."
	umount /mnt
	echo "Moving..."
	mv test fsck.okay
	echo "All done."
}

FS=ext2
if [ .$1 == .-p ]; then
	prepare
	exit
	fi
RUN=0
while true; do
	RUN=$[$RUN+1]
	echo "Run #$RUN"
	echo Preparing...
	cat fsck.okay > fsck.damaged
	echo Damaging...
	dd if=/dev/urandom of=fsck.damaged count=10240 seek=7 conv=notrunc
	cp fsck.damaged fsck.test
	echo First check...
	fsck.$FS -fy fsck.damaged
	RESULT=$?
	if [ $RESULT != 1 -a $RESULT != 2 -a $RESULT != 0 ]; then
		echo "Fsck failed in bad way (result = $RESULT)"
		exit
		fi
	echo Second check...
	fsck.$FS -fy fsck.damaged
	RESULT=$?
	if [ $RESULT != 0 ]; then
		echo "Fsck lied about its success (result = $RESULT)"
		exit
		fi
	done


-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC 0/13] extents and 48bit ext3
  2006-06-09 15:56               ` Jeff Garzik
  2006-06-09 16:07                 ` Alex Tomas
@ 2006-06-09 20:52                 ` Stephen C. Tweedie
  2006-06-09 21:47                   ` [Ext2-devel] " Jeff Garzik
  1 sibling, 1 reply; 104+ messages in thread
From: Stephen C. Tweedie @ 2006-06-09 20:52 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Andrew Morton, Stephen Tweedie, ext2-devel@lists.sourceforge.net,
	linux-kernel, Linus Torvalds, Mingming Cao, linux-fsdevel,
	Alex Tomas, Andreas Dilger

Hi,

On Fri, 2006-06-09 at 11:56 -0400, Jeff Garzik wrote:

> Think about how this will be deployed in production, long term.
> 
> If extents are not made default at some point, then no one will use the 
> feature, and it should not be merged.

Features such as ACLs and SELinux are still not on by default and are
most *definitely* used.  This is a bogus argument.

> And when extents are default, you have this blizzard-of-feature-flags 
> stealth upgrade event occur _sometime_ after they boot into the new fs 
> for the first time.

No.  I don't see it ever being forced on in the kernel by default, so
there will be no such "stealth upgrades".

Rather, if it is "made default", that will be done by setting the flag
by default on newly-created filesystems in mke2fs.  We won't be playing
magic on existing filesystems.

And to avoid confusion, I am *entirely* open to the idea of making it
only ever default to on in mke2fs at some point in the future where we
batch a set of incompat features with the "ext4" label, so that "mke2fs
-O ext4", or "mke4fs", would set it.  That has already been proposed on
ext2-devel; we're nowhere near the stage of making that default yet.

--Stephen

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [Ext2-devel] [RFC 0/13] extents and 48bit ext3
  2006-06-09 20:52                 ` Stephen C. Tweedie
@ 2006-06-09 21:47                   ` Jeff Garzik
  0 siblings, 0 replies; 104+ messages in thread
From: Jeff Garzik @ 2006-06-09 21:47 UTC (permalink / raw)
  To: Stephen C. Tweedie
  Cc: Alex Tomas, Andrew Morton, ext2-devel@lists.sourceforge.net,
	linux-kernel, Linus Torvalds, Mingming Cao, linux-fsdevel,
	Andreas Dilger

Stephen C. Tweedie wrote:
> Hi,
> 
> On Fri, 2006-06-09 at 11:56 -0400, Jeff Garzik wrote:
> 
>> Think about how this will be deployed in production, long term.
>>
>> If extents are not made default at some point, then no one will use the 
>> feature, and it should not be merged.
> 
> Features such as ACLs and SELinux are still not on by default and are
> most *definitely* used.  This is a bogus argument.

They are on in SElinux-enabled distro installs, AFAIK?


>> And when extents are default, you have this blizzard-of-feature-flags 
>> stealth upgrade event occur _sometime_ after they boot into the new fs 
>> for the first time.
> 
> No.  I don't see it ever being forced on in the kernel by default, so
> there will be no such "stealth upgrades".
> 
> Rather, if it is "made default", that will be done by setting the flag
> by default on newly-created filesystems in mke2fs.  We won't be playing
> magic on existing filesystems.
> 
> And to avoid confusion, I am *entirely* open to the idea of making it
> only ever default to on in mke2fs at some point in the future where we
> batch a set of incompat features with the "ext4" label, so that "mke2fs
> -O ext4", or "mke4fs", would set it.  That has already been proposed on
> ext2-devel; we're nowhere near the stage of making that default yet.

Sure.  And why not bundle that with a vehicle for separating out the 
_code_ that deals with ancient formats versus newer formats.  A vehicle 
that enables the existing ext3 stuff to stabilize and freeze, while 
enabling parallel development of new features.

	Jeff



^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [Ext2-devel] [RFC 0/13] extents and 48bit ext3
  2006-06-09 15:40         ` Linus Torvalds
  2006-06-09 15:47           ` Jeff Garzik
@ 2006-06-09 16:10           ` Alex Tomas
  2006-06-09 16:10             ` Jeff Garzik
  2006-06-09 16:25             ` Linus Torvalds
  1 sibling, 2 replies; 104+ messages in thread
From: Alex Tomas @ 2006-06-09 16:10 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jeff Garzik, Andrew Morton, ext2-devel, linux-kernel, cmm,
	linux-fsdevel, Andreas Dilger

>>>>> Linus Torvalds (LT) writes:

 LT> Quite frankly, at this point, there's no way in hell I believe we can do 
 LT> major surgery on ext3. It's the main filesystem for a lot of users, and 
 LT> it's just not worth the instability worries unless it's something very 
 LT> obviously transparent.

I believe it's as stable as before until you mount with extents
mount option.

thanks, Alex

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [Ext2-devel] [RFC 0/13] extents and 48bit ext3
  2006-06-09 16:10           ` Alex Tomas
@ 2006-06-09 16:10             ` Jeff Garzik
  2006-06-09 16:24               ` Erik Mouw
                                 ` (2 more replies)
  2006-06-09 16:25             ` Linus Torvalds
  1 sibling, 3 replies; 104+ messages in thread
From: Jeff Garzik @ 2006-06-09 16:10 UTC (permalink / raw)
  To: Alex Tomas
  Cc: Linus Torvalds, Andrew Morton, ext2-devel, linux-kernel, cmm,
	linux-fsdevel, Andreas Dilger

Alex Tomas wrote:
>>>>>> Linus Torvalds (LT) writes:
> 
> 
>  LT> Quite frankly, at this point, there's no way in hell I believe we can do 
>  LT> major surgery on ext3. It's the main filesystem for a lot of users, and 
>  LT> it's just not worth the instability worries unless it's something very 
>  LT> obviously transparent.
> 
> I believe it's as stable as before until you mount with extents
> mount option.

If it will remain a mount option, if it is never made the default 
(either in kernel or distro level), then only 1% of users will ever use 
the feature.  And we shouldn't merge a 1% use feature into the _main_ 
filesystem for Linux.

	Jeff




^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [Ext2-devel] [RFC 0/13] extents and 48bit ext3
  2006-06-09 16:10             ` Jeff Garzik
@ 2006-06-09 16:24               ` Erik Mouw
  2006-06-09 16:24               ` Chase Venters
  2006-06-09 16:25               ` Alex Tomas
  2 siblings, 0 replies; 104+ messages in thread
From: Erik Mouw @ 2006-06-09 16:24 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Alex Tomas, Andrew Morton, ext2-devel, linux-kernel,
	Linus Torvalds, cmm, linux-fsdevel, Andreas Dilger

On Fri, Jun 09, 2006 at 12:10:59PM -0400, Jeff Garzik wrote:
> Alex Tomas wrote:
> > I believe it's as stable as before until you mount with extents
> > mount option.
> 
> If it will remain a mount option, if it is never made the default 
> (either in kernel or distro level), then only 1% of users will ever use 
> the feature.  And we shouldn't merge a 1% use feature into the _main_ 
> filesystem for Linux.

Why not? That's how htree dir indexing got in, and AFAIK most distros
use it as a default.


Erik

-- 
+-- Erik Mouw -- www.harddisk-recovery.com -- +31 70 370 12 90 --
| Lab address: Delftechpark 26, 2628 XH, Delft, The Netherlands

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [Ext2-devel] [RFC 0/13] extents and 48bit ext3
  2006-06-09 16:10             ` Jeff Garzik
  2006-06-09 16:24               ` Erik Mouw
@ 2006-06-09 16:24               ` Chase Venters
  2006-06-09 16:25               ` Alex Tomas
  2 siblings, 0 replies; 104+ messages in thread
From: Chase Venters @ 2006-06-09 16:24 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Alex Tomas, Linus Torvalds, Andrew Morton, ext2-devel,
	linux-kernel, cmm, linux-fsdevel, Andreas Dilger

On Fri, 9 Jun 2006, Jeff Garzik wrote:

> Alex Tomas wrote:
>> > > > > >  Linus Torvalds (LT) writes:
>>
>> 
>> LT>  Quite frankly, at this point, there's no way in hell I believe we can 
>> LT>  do major surgery on ext3. It's the main filesystem for a lot of users, 
>> LT>  and it's just not worth the instability worries unless it's something 
>> LT>  very obviously transparent.
>>
>>  I believe it's as stable as before until you mount with extents
>>  mount option.
>
> If it will remain a mount option, if it is never made the default (either in 
> kernel or distro level), then only 1% of users will ever use the feature. 
> And we shouldn't merge a 1% use feature into the _main_ filesystem for Linux.

Pardon me because I haven't made it all the way through this discussion 
yet, so I don't know if this has been suggested or dismissed. But I'm 
curious - rather than 'stealth upgrade' by way of mount options, why not 
just enable the functionality either via tune2fs or mkfs.ext3?

New distribution versions could ship installers that enable it, because users 
aren't really going to switch from a new distribution they just install to 
an older version (same story on the kernel).

Users that want the functionality today can have it by asking for it with 
tune2fs, they just have to bypass the warning that tells them they're not 
going to be able to boot kernels before 2.6.xx

> 	Jeff

Cheers,
Chase

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [Ext2-devel] [RFC 0/13] extents and 48bit ext3
  2006-06-09 16:10             ` Jeff Garzik
  2006-06-09 16:24               ` Erik Mouw
  2006-06-09 16:24               ` Chase Venters
@ 2006-06-09 16:25               ` Alex Tomas
  2006-06-09 16:28                 ` Jeff Garzik
  2 siblings, 1 reply; 104+ messages in thread
From: Alex Tomas @ 2006-06-09 16:25 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Alex Tomas, Linus Torvalds, Andrew Morton, ext2-devel,
	linux-kernel, cmm, linux-fsdevel, Andreas Dilger

>>>>> Jeff Garzik (JG) writes:

 JG> If it will remain a mount option, if it is never made the default
 JG> (either in kernel or distro level), then only 1% of users will ever
 JG> use the feature.  And we shouldn't merge a 1% use feature into the
 JG> _main_ filesystem for Linux.

strictly speaking, not that many users really need >2TB fs ...

thanks, Alex

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC 0/13] extents and 48bit ext3
  2006-06-09 16:25               ` Alex Tomas
@ 2006-06-09 16:28                 ` Jeff Garzik
  2006-06-09 16:50                   ` Alex Tomas
  0 siblings, 1 reply; 104+ messages in thread
From: Jeff Garzik @ 2006-06-09 16:28 UTC (permalink / raw)
  To: Alex Tomas
  Cc: Andrew Morton, ext2-devel, linux-kernel, Linus Torvalds, cmm,
	linux-fsdevel, Andreas Dilger

Alex Tomas wrote:
>>>>>> Jeff Garzik (JG) writes:
> 
>  JG> If it will remain a mount option, if it is never made the default
>  JG> (either in kernel or distro level), then only 1% of users will ever
>  JG> use the feature.  And we shouldn't merge a 1% use feature into the
>  JG> _main_ filesystem for Linux.
> 
> strictly speaking, not that many users really need >2TB fs ...

Not true.  Terabyte SATA drives are less than a year away.  2TB 
drives... probably 2 years?

	Jeff

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC 0/13] extents and 48bit ext3
  2006-06-09 16:28                 ` Jeff Garzik
@ 2006-06-09 16:50                   ` Alex Tomas
  2006-06-09 16:53                     ` [Ext2-devel] " Jeff Garzik
  0 siblings, 1 reply; 104+ messages in thread
From: Alex Tomas @ 2006-06-09 16:50 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Andrew Morton, ext2-devel, linux-kernel, Linus Torvalds, cmm,
	linux-fsdevel, Alex Tomas, Andreas Dilger

>>>>> Jeff Garzik (JG) writes:

 JG> Alex Tomas wrote:
 >>>>>>> Jeff Garzik (JG) writes:
 JG> If it will remain a mount option, if it is never made the
 >> default
 JG> (either in kernel or distro level), then only 1% of users will ever
 JG> use the feature.  And we shouldn't merge a 1% use feature into the
 JG> _main_ filesystem for Linux.
 >> strictly speaking, not that many users really need >2TB fs ...

 JG> Not true.  Terabyte SATA drives are less than a year away.  2TB
 JG> drives... probably 2 years?

oh, 2 years sound long enough for defaulting extents?

thanks, Alex

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [Ext2-devel] [RFC 0/13] extents and 48bit ext3
  2006-06-09 16:50                   ` Alex Tomas
@ 2006-06-09 16:53                     ` Jeff Garzik
  2006-06-09 17:01                       ` Alex Tomas
  0 siblings, 1 reply; 104+ messages in thread
From: Jeff Garzik @ 2006-06-09 16:53 UTC (permalink / raw)
  To: Alex Tomas
  Cc: Linus Torvalds, Andrew Morton, ext2-devel, linux-kernel, cmm,
	linux-fsdevel, Andreas Dilger

Alex Tomas wrote:
>>>>>> Jeff Garzik (JG) writes:
> 
>  JG> Alex Tomas wrote:
>  >>>>>>> Jeff Garzik (JG) writes:
>  JG> If it will remain a mount option, if it is never made the
>  >> default
>  JG> (either in kernel or distro level), then only 1% of users will ever
>  JG> use the feature.  And we shouldn't merge a 1% use feature into the
>  JG> _main_ filesystem for Linux.
>  >> strictly speaking, not that many users really need >2TB fs ...
> 
>  JG> Not true.  Terabyte SATA drives are less than a year away.  2TB
>  JG> drives... probably 2 years?
> 
> oh, 2 years sound long enough for defaulting extents?

If terabyte drives will be here in less than a year, and 750GB drives 
are already here, then people with today's commodity hardware are 
probably already chomping at the bit to do >2TB LVM and RAID.

Hook eight 750GB SATA drives to a Marvell SATA controller (all 
commodity, all production) and you're way past 2TB.

	Jeff




^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [Ext2-devel] [RFC 0/13] extents and 48bit ext3
  2006-06-09 16:53                     ` [Ext2-devel] " Jeff Garzik
@ 2006-06-09 17:01                       ` Alex Tomas
  2006-06-09 17:10                         ` Jeff Garzik
  0 siblings, 1 reply; 104+ messages in thread
From: Alex Tomas @ 2006-06-09 17:01 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Alex Tomas, Linus Torvalds, Andrew Morton, ext2-devel,
	linux-kernel, cmm, linux-fsdevel, Andreas Dilger

that's why we're trying to get it in *now*. because we need it.
and nobody AFAIK insists to make extents default or such.

thanks, Alex

>>>>> Jeff Garzik (JG) writes:

 JG> If terabyte drives will be here in less than a year, and 750GB drives
 JG> are already here, then people with today's commodity hardware are
 JG> probably already chomping at the bit to do >2TB LVM and RAID.

 JG> Hook eight 750GB SATA drives to a Marvell SATA controller (all
 JG> commodity, all production) and you're way past 2TB.

 JG> 	Jeff

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [Ext2-devel] [RFC 0/13] extents and 48bit ext3
  2006-06-09 17:01                       ` Alex Tomas
@ 2006-06-09 17:10                         ` Jeff Garzik
  0 siblings, 0 replies; 104+ messages in thread
From: Jeff Garzik @ 2006-06-09 17:10 UTC (permalink / raw)
  To: Alex Tomas
  Cc: Linus Torvalds, Andrew Morton, ext2-devel, linux-kernel, cmm,
	linux-fsdevel, Andreas Dilger

Alex Tomas wrote:
> that's why we're trying to get it in *now*. because we need it.
> and nobody AFAIK insists to make extents default or such.

huh?  If its needed, it will be default eventually.

	Jeff




^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [Ext2-devel] [RFC 0/13] extents and 48bit ext3
  2006-06-09 16:10           ` Alex Tomas
  2006-06-09 16:10             ` Jeff Garzik
@ 2006-06-09 16:25             ` Linus Torvalds
  2006-06-09 16:48               ` Alex Tomas
                                 ` (3 more replies)
  1 sibling, 4 replies; 104+ messages in thread
From: Linus Torvalds @ 2006-06-09 16:25 UTC (permalink / raw)
  To: Alex Tomas
  Cc: Jeff Garzik, Andrew Morton, ext2-devel, linux-kernel, cmm,
	linux-fsdevel, Andreas Dilger

On Fri, 9 Jun 2006, Alex Tomas wrote:
> 
> I believe it's as stable as before until you mount with extents
> mount option.

That's always a possibility in theory, and almost never in practice.

Btw, I don't care about extents _per_se_. I do care about the fact that 
people seem to think that code gets better as it supports more features. 
Not so.

The whole logic of "code sharing is good" is a huge mistake. Shared code 
is not at all better than individual code snippets, and often much much 
worse. In particular, if the shared code has separate code-paths, not just 
twice as complicated: it's _more_ than twice as bad, since it introduces 
the conditionals _and_ it introduces the very real risk of the conditional 
being taken the wrong way by mistake.

In contrast, the last time two different filesystems introduced bugs in 
each other was approximately "never". They simply don't modify each others 
code, they don't look at each others data structures, and they don't jump 
into each others routines.

So two separate filesystems are _less_ to maintain than one big one. Even 
if there's a lot of code that -could- be shared.

And no, extents in themselves aren't necessarily "the thing" that drives 
it from maintainable to unmaintainable. This crap grows over time. But I 
would _serious_ suggest that starting anew with a "new" filesystem, and 
taking the time to actually also get _rid_ of some of the baggage would 
quite likely be a good idea.

Just as an example: ext3 _sucks_ in many ways. It has huge inodes that 
take up way too much space in memory. It has absolutely disgusting code to 
handle directory reading and writing (buffer heads! In 2006!). It's 
conditional indexing code is horrible. Its performance absolutely sucks 
when the journal is being drained or something.

Are you going to improve on any of those _fundamnetal_ problems? Or are 
you going to make them worse?

Hint: I'm betting you're not going to improve them by adding more 
features.

		Linus

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC 0/13] extents and 48bit ext3
  2006-06-09 16:25             ` Linus Torvalds
@ 2006-06-09 16:48               ` Alex Tomas
  2006-06-09 16:55                 ` Jeff Garzik
  2006-06-09 16:54               ` Linus Torvalds
                                 ` (2 subsequent siblings)
  3 siblings, 1 reply; 104+ messages in thread
From: Alex Tomas @ 2006-06-09 16:48 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andrew Morton, Jeff Garzik, ext2-devel, linux-kernel, cmm,
	linux-fsdevel, Alex Tomas, Andreas Dilger

so, instead of taking one (quite-well-tested) part that solves one of
the biggest ext3 limitation, you propose to start a new project and
get something in a year (probably) ?

I think about extents as a step-by-step way ...

thanks, Alex

>>>>> Linus Torvalds (LT) writes:

 LT> Just as an example: ext3 _sucks_ in many ways. It has huge inodes that 
 LT> take up way too much space in memory. It has absolutely disgusting code to 
 LT> handle directory reading and writing (buffer heads! In 2006!). It's 
 LT> conditional indexing code is horrible. Its performance absolutely sucks 
 LT> when the journal is being drained or something.

 LT> Are you going to improve on any of those _fundamnetal_ problems? Or are 
 LT> you going to make them worse?

 LT> Hint: I'm betting you're not going to improve them by adding more 
 LT> features.

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC 0/13] extents and 48bit ext3
  2006-06-09 16:48               ` Alex Tomas
@ 2006-06-09 16:55                 ` Jeff Garzik
  2006-06-09 17:12                   ` [Ext2-devel] " Alex Tomas
  2006-06-09 19:57                   ` Theodore Tso
  0 siblings, 2 replies; 104+ messages in thread
From: Jeff Garzik @ 2006-06-09 16:55 UTC (permalink / raw)
  To: Alex Tomas
  Cc: Andrew Morton, ext2-devel, linux-kernel, Linus Torvalds, cmm,
	linux-fsdevel, Andreas Dilger

Alex Tomas wrote:
> so, instead of taking one (quite-well-tested) part that solves one of
> the biggest ext3 limitation, you propose to start a new project and
> get something in a year (probably) ?
> 
> I think about extents as a step-by-step way ...

That is what the entirety of Linux development is -- step-by-step.

It is OBVIOUS that it would take five minutes to start ext4.

1) clone a new tree
2) cp -a fs/ext3 fs/ext4
3) apply extent and 48bit patches
4) apply related e2fsprogs patches

Then update ext4 step-by-step, using the normal Linux development process.

	Jeff

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [Ext2-devel] [RFC 0/13] extents and 48bit ext3
  2006-06-09 16:55                 ` Jeff Garzik
@ 2006-06-09 17:12                   ` Alex Tomas
  2006-06-09 17:12                     ` Jeff Garzik
  2006-06-09 19:57                   ` Theodore Tso
  1 sibling, 1 reply; 104+ messages in thread
From: Alex Tomas @ 2006-06-09 17:12 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Alex Tomas, Linus Torvalds, Andrew Morton, ext2-devel,
	linux-kernel, cmm, linux-fsdevel, Andreas Dilger

>>>>> Jeff Garzik (JG) writes:

 JG> That is what the entirety of Linux development is -- step-by-step.

 JG> It is OBVIOUS that it would take five minutes to start ext4.

right. it's not a problem to *start*. it's a problem it maintain.
day by day fs/ext3 and fs/ext4 will get more and more diffs.
at some point it will be a headache to apply patches from ext3
to ext4 and back. I known this very well ....

thanks, Alex

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [Ext2-devel] [RFC 0/13] extents and 48bit ext3
  2006-06-09 17:12                   ` [Ext2-devel] " Alex Tomas
@ 2006-06-09 17:12                     ` Jeff Garzik
  0 siblings, 0 replies; 104+ messages in thread
From: Jeff Garzik @ 2006-06-09 17:12 UTC (permalink / raw)
  To: Alex Tomas
  Cc: Linus Torvalds, Andrew Morton, ext2-devel, linux-kernel, cmm,
	linux-fsdevel, Andreas Dilger

Alex Tomas wrote:
>>>>>> Jeff Garzik (JG) writes:
> 
>  JG> That is what the entirety of Linux development is -- step-by-step.
> 
>  JG> It is OBVIOUS that it would take five minutes to start ext4.
> 
> right. it's not a problem to *start*. it's a problem it maintain.
> day by day fs/ext3 and fs/ext4 will get more and more diffs.
> at some point it will be a headache to apply patches from ext3
> to ext4 and back. I known this very well ....

As Linus has stated, we have empirical evidence that splitting 
filesystems works, for both stability and development speed.

The number of patches to ext[23] will trickle off over time.  As the 
obvious example, ext4 would receive the extent and 48bit patches rather 
than ext3 :)

	Jeff




^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC 0/13] extents and 48bit ext3
  2006-06-09 16:55                 ` Jeff Garzik
  2006-06-09 17:12                   ` [Ext2-devel] " Alex Tomas
@ 2006-06-09 19:57                   ` Theodore Tso
  2006-06-09 20:09                     ` Jeff Garzik
  2006-06-09 20:38                     ` Joel Becker
  1 sibling, 2 replies; 104+ messages in thread
From: Theodore Tso @ 2006-06-09 19:57 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Andrew Morton, ext2-devel, linux-kernel, Linus Torvalds, cmm,
	linux-fsdevel, Alex Tomas, Andreas Dilger

On Fri, Jun 09, 2006 at 12:55:09PM -0400, Jeff Garzik wrote:
> That is what the entirety of Linux development is -- step-by-step.
> 
> It is OBVIOUS that it would take five minutes to start ext4.
> 
> 1) clone a new tree
> 2) cp -a fs/ext3 fs/ext4
> 3) apply extent and 48bit patches
> 4) apply related e2fsprogs patches
> 
> Then update ext4 step-by-step, using the normal Linux development process.

We don't do this with the SCSI layer where we make a complete clone of
the driver layer so that there is a /usr/src/linux/driver/scsi and
/usr/src/linux/driver/scsi2, do we?  And we didn't do that with the
networking layer either, as we added ipsec, ipv6, softnet, and a whole
host of other changes and improvements.  

What we do instead is we have a series of patches, which can be made
available in various experimental trees, and as they get more
polishing and experience with people using it without any problems,
they can get merged into the -mm tree, and then eventually, when they
are deemed ready, into mainline.  That is also the normal Linux
development process, and it's worked quite well up until now with ext3.

Folks seem to be worried about ext3 being "too important to experiment
with", but the fact remains, we've been doing continuous improvement
with ext3 for quite some time, and it's been quite smooth.  The htree
introduction was essentially completely painless, for example --- and
people liked the fact that they could get the features of indexed
directories without needing to do a complete dump and restore of the
filesystem.

						- Ted

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC 0/13] extents and 48bit ext3
  2006-06-09 19:57                   ` Theodore Tso
@ 2006-06-09 20:09                     ` Jeff Garzik
  2006-06-09 20:14                       ` Alex Tomas
  2006-06-09 20:38                     ` Joel Becker
  1 sibling, 1 reply; 104+ messages in thread
From: Jeff Garzik @ 2006-06-09 20:09 UTC (permalink / raw)
  To: Theodore Tso
  Cc: Andrew Morton, ext2-devel, linux-kernel, Linus Torvalds, cmm,
	linux-fsdevel, Alex Tomas, Andreas Dilger

Theodore Tso wrote:
> We don't do this with the SCSI layer where we make a complete clone of
> the driver layer so that there is a /usr/src/linux/driver/scsi and
> /usr/src/linux/driver/scsi2, do we?  And we didn't do that with the
> networking layer either, as we added ipsec, ipv6, softnet, and a whole
> host of other changes and improvements.  
> 
> What we do instead is we have a series of patches, which can be made
> available in various experimental trees, and as they get more
> polishing and experience with people using it without any problems,
> they can get merged into the -mm tree, and then eventually, when they
> are deemed ready, into mainline.  That is also the normal Linux
> development process, and it's worked quite well up until now with ext3.

No, there is a key difference between ext3 and SCSI/etc.:  cruft is removed.

In ext3, old formats are supported for all eternity.


> Folks seem to be worried about ext3 being "too important to experiment
> with", but the fact remains, we've been doing continuous improvement
> with ext3 for quite some time, and it's been quite smooth.  The htree
> introduction was essentially completely painless, for example --- and

I disagree.  There were some distro annoyances as I recall.


> people liked the fact that they could get the features of indexed
> directories without needing to do a complete dump and restore of the
> filesystem.

Of course people always like new features.  :)

ext4 should allow you to deliver new features more rapidly, while 
keeping the existing ext3 happily stable.

	Jeff

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC 0/13] extents and 48bit ext3
  2006-06-09 20:09                     ` Jeff Garzik
@ 2006-06-09 20:14                       ` Alex Tomas
  2006-06-19  7:48                         ` [Ext2-devel] " Helge Hafting
  0 siblings, 1 reply; 104+ messages in thread
From: Alex Tomas @ 2006-06-09 20:14 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Andrew Morton, Theodore Tso, ext2-devel, linux-kernel,
	Linus Torvalds, cmm, linux-fsdevel, Alex Tomas, Andreas Dilger

>>>>> Jeff Garzik (JG) writes:

 JG> No, there is a key difference between ext3 and SCSI/etc.:  cruft is removed.

 JG> In ext3, old formats are supported for all eternity.

we'd need this anyway. just to let users to migrate.

thanks, Alex

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [Ext2-devel] [RFC 0/13] extents and 48bit ext3
  2006-06-09 20:14                       ` Alex Tomas
@ 2006-06-19  7:48                         ` Helge Hafting
  0 siblings, 0 replies; 104+ messages in thread
From: Helge Hafting @ 2006-06-19  7:48 UTC (permalink / raw)
  To: Alex Tomas
  Cc: Jeff Garzik, Theodore Tso, Andrew Morton, ext2-devel,
	linux-kernel, Linus Torvalds, cmm, linux-fsdevel, Andreas Dilger

Alex Tomas wrote:
>>>>>> Jeff Garzik (JG) writes:
>>>>>>             
>
>  JG> No, there is a key difference between ext3 and SCSI/etc.:  cruft is removed.
>
>  JG> In ext3, old formats are supported for all eternity.
>
> we'd need this anyway. just to let users to migrate.
>   
Not really.  Today, people use reiserfs even though they couldn't
just remount their old ext2 as reiserfs.

An ext2/ext3-incompatible ext4 isn't a problem.  Sure, people will
have to mkfs instead of just remounting, and that will mean fewer
quick conversions in the short-term.  But people using ext3 today
don't really need ext4 - they are per definition running on sufficiently
small disks/partitions.

So an incompatible ext4 will still see use - on new filesystems mostly.
Not a problem, people buy disks all the time.

Helge Hafting

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC 0/13] extents and 48bit ext3
  2006-06-09 19:57                   ` Theodore Tso
  2006-06-09 20:09                     ` Jeff Garzik
@ 2006-06-09 20:38                     ` Joel Becker
  2006-06-09 20:50                       ` Dave Jones
  2006-06-09 21:03                       ` Theodore Tso
  1 sibling, 2 replies; 104+ messages in thread
From: Joel Becker @ 2006-06-09 20:38 UTC (permalink / raw)
  To: Theodore Tso, Jeff Garzik, Alex Tomas, Andrew Morton, ext2-devel,
	linux-kernel, Linus Torvalds, cmm, linux-fsdevel, Andreas Dilger

On Fri, Jun 09, 2006 at 03:57:50PM -0400, Theodore Tso wrote:
> We don't do this with the SCSI layer where we make a complete clone of
> the driver layer so that there is a /usr/src/linux/driver/scsi and
> /usr/src/linux/driver/scsi2, do we?  And we didn't do that with the
> networking layer either, as we added ipsec, ipv6, softnet, and a whole
> host of other changes and improvements.  

Ted,
	We don't have any permanent, physical representation of the
state either.  With a filesystem we do.  I don't care how many changes
you made to the SCSI stack.  The code from a year ago could be entirely
different.  However, if the old stack and the new stack both support
card X, then it Just Works.  The Adaptec driver is a case in point.
When the new driver was still flaky, folks and distros could select the
old driver with impunity.  Running the new driver didn't fundamentally
change your Adaptec card so you couldn't run the old one.
	Filesystem features are different.  There is a permanent state
that the older code cannot read.  Alex claims people just shouldn't use
"-o extents", but the fact is their distro will choose it for them.  We
have multiboot machines in our test lab, because like many people we
don't have unlimited funds.  What happened when we installed the 2.6
distros?  All of a sudden the older 2.4 distros wouldn't mount the
shared filesystems, becuase of ext3 features.  This wasn't the kernel
driver, this was merely the tools!  Surprise!  We made no choice to use
new features, and they were thrust upon us.  This will happen to others.

Joel

-- 

"Sometimes one pays most for the things one gets for nothing."
        - Albert Einstein

Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker@oracle.com
Phone: (650) 506-8127

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC 0/13] extents and 48bit ext3
  2006-06-09 20:38                     ` Joel Becker
@ 2006-06-09 20:50                       ` Dave Jones
  2006-06-09 21:32                         ` [Ext2-devel] " Jeff Garzik
  2006-06-09 21:03                       ` Theodore Tso
  1 sibling, 1 reply; 104+ messages in thread
From: Dave Jones @ 2006-06-09 20:50 UTC (permalink / raw)
  To: Theodore Tso, Jeff Garzik, Alex Tomas, Andrew Morton, ext2-devel,
	linux-kernel, Linus Torvalds, cmm, linux-fsdevel, Andreas Dilger

On Fri, Jun 09, 2006 at 01:38:03PM -0700, Joel Becker wrote:
 > that the older code cannot read.  Alex claims people just shouldn't use
 > "-o extents", but the fact is their distro will choose it for them.

.. on partitions over a certain size, which couldn't be read with
older ext3 filesystems _anyway_

Enabling it by default on partitions of a size less than those
that need extents seems to be somewhat pointless to me?

Am I missing something fundamental that precludes the use of both
extent-based and current existing filesystems from the same code
simultaneously ?

		Dave

-- 
http://www.codemonkey.org.uk

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [Ext2-devel] [RFC 0/13] extents and 48bit ext3
  2006-06-09 20:50                       ` Dave Jones
@ 2006-06-09 21:32                         ` Jeff Garzik
  2006-06-09 22:56                           ` Andreas Dilger
  0 siblings, 1 reply; 104+ messages in thread
From: Jeff Garzik @ 2006-06-09 21:32 UTC (permalink / raw)
  To: Dave Jones
  Cc: Theodore Tso, Alex Tomas, Andrew Morton, ext2-devel, linux-kernel,
	Linus Torvalds, cmm, linux-fsdevel, Andreas Dilger

Dave Jones wrote:
> Am I missing something fundamental that precludes the use of both
> extent-based and current existing filesystems from the same code
> simultaneously ?

Nothing precludes it.  The point is that introducing major format 
changes inline in this manner just complicates the code progressively to 
the point where your directory walking, inode block walking, and other 
code winds up being

	if (new)
		...
	else
		...

_anyway_, at which point it is essentially multiple independent 
filesystems.  I guarantee this won't be the last fundamental fs metadata 
design change people will want to make...

	Jeff

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [Ext2-devel] [RFC 0/13] extents and 48bit ext3
  2006-06-09 21:32                         ` [Ext2-devel] " Jeff Garzik
@ 2006-06-09 22:56                           ` Andreas Dilger
  2006-06-09 23:09                             ` Jeff Garzik
  0 siblings, 1 reply; 104+ messages in thread
From: Andreas Dilger @ 2006-06-09 22:56 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Dave Jones, Theodore Tso, Alex Tomas, Andrew Morton, ext2-devel,
	linux-kernel, Linus Torvalds, cmm, linux-fsdevel

On Jun 09, 2006  17:32 -0400, Jeff Garzik wrote:
> Dave Jones wrote:
> >Am I missing something fundamental that precludes the use of both
> >extent-based and current existing filesystems from the same code
> >simultaneously ?
> 
> Nothing precludes it.  The point is that introducing major format 
> changes inline in this manner just complicates the code progressively to 
> the point where your directory walking, inode block walking, and other 
> code winds up being
> 
> 	if (new)
> 		...
> 	else
> 		...
> 
> _anyway_, at which point it is essentially multiple independent 
> filesystems.  I guarantee this won't be the last fundamental fs metadata 
> design change people will want to make...

Umm, and how is this fundamentally different from similar code paths in
the VFS (e.g. O_DIRECT vs regular writes)?  Should we make a copy of the
whole write path for each of O_DIRECT, AIO, pwrite, etc writes, or should
we instead add in a small change to the write path than leverages the
majority of the existing code?

What is better, using the 95% of the VFS that is common and change 5% to
work with the filesystem, or duplicate the whole VFS just because 5%
needs to be different?

In the extents case, the large majority of the ext3 code is the same
(directory, inode handling, superblock, etc) and only the on-disk format
for indirect blocks has changed.  Yes, we also want to change the block
allocator next in order to improve the performance in conjunction with
extents, but that is purely an in-memory change that has no direct
relation to on-disk layout.  The major motivations for the extents format:
(a) more compact on-disk representation for large files (improves unlink
    performance, reduces memory usage for metadata)
(b) support for larger filesystems (which will affect everyone soon enough).
(c) integrate well with improved allocation support

For most of the ext3 developers (b) is the primary motivation here, and
given that so many people are vocal about ext3 changes that must mean
that there are a lot of ext3 users here.  Does that mean that in the next
few years all of the objectors will abandon ext3 in favour of ext4 or XFS
or JFS or reiserfs or reiser4 when you get a new system with a single 12TB
disk?  And we can delete ext3 then, or will you be happy then that ext3
supports these large disks without any effort on your part?

Maybe we should start by deleting ext2 because it is old and obsolete?
The reality is that we will never merge the forks back once they are made.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC 0/13] extents and 48bit ext3
  2006-06-09 22:56                           ` Andreas Dilger
@ 2006-06-09 23:09                             ` Jeff Garzik
  2006-06-09 23:37                               ` [Ext2-devel] " Andreas Dilger
  0 siblings, 1 reply; 104+ messages in thread
From: Jeff Garzik @ 2006-06-09 23:09 UTC (permalink / raw)
  To: Jeff Garzik, Dave Jones, Theodore Tso, Alex Tomas, Andrew Morton,
	ext2-devel, linux-kernel, Linus Torvalds, cmm, linux-fsdevel

Andreas Dilger wrote:
> Maybe we should start by deleting ext2 because it is old and obsolete?
> The reality is that we will never merge the forks back once they are made.

We _already have_ a relevant example:  ext2 -> ext3.

A useful fork is in the tree, and you're working on it.

	Jeff

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [Ext2-devel] [RFC 0/13] extents and 48bit ext3
  2006-06-09 23:09                             ` Jeff Garzik
@ 2006-06-09 23:37                               ` Andreas Dilger
  0 siblings, 0 replies; 104+ messages in thread
From: Andreas Dilger @ 2006-06-09 23:37 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Dave Jones, Theodore Tso, Alex Tomas, Andrew Morton, ext2-devel,
	linux-kernel, Linus Torvalds, cmm, linux-fsdevel

On Jun 09, 2006  19:09 -0400, Jeff Garzik wrote:
> Andreas Dilger wrote:
> >Maybe we should start by deleting ext2 because it is old and obsolete?
> >The reality is that we will never merge the forks back once they are made.
> 
> We _already have_ a relevant example:  ext2 -> ext3.
> 
> A useful fork is in the tree, and you're working on it.

OK, you're right.  We'll continue working on the fork (namely ext3) and
when people who care consider those features stable enough they can port
them to ext2. :-)

Like another person pointed out - there are bugs that are fixed in ext3
that aren't in fixed ext2, and vice versa.  Even though the ext2 code
is basically dead, new bugs are still found in it.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC 0/13] extents and 48bit ext3
  2006-06-09 20:38                     ` Joel Becker
  2006-06-09 20:50                       ` Dave Jones
@ 2006-06-09 21:03                       ` Theodore Tso
  2006-06-09 21:24                         ` Joel Becker
  1 sibling, 1 reply; 104+ messages in thread
From: Theodore Tso @ 2006-06-09 21:03 UTC (permalink / raw)
  To: Jeff Garzik, Alex Tomas, Andrew Morton, ext2-devel, linux-kernel,
	Linus Torvalds, cmm, linux-fsdevel, Andreas Dilger

On Fri, Jun 09, 2006 at 01:38:03PM -0700, Joel Becker wrote:
> 	Filesystem features are different.  There is a permanent state
> that the older code cannot read.  Alex claims people just shouldn't use
> "-o extents", but the fact is their distro will choose it for them.  We
> have multiboot machines in our test lab, because like many people we
> don't have unlimited funds.  What happened when we installed the 2.6
> distros?  All of a sudden the older 2.4 distros wouldn't mount the
> shared filesystems, becuase of ext3 features.  

This is going to happen regardless of whether we call the code base
"ext3" or "ext4".  Anytime you make format changes (in this case, to
support larger disk sizes) older kernels won't support it any more.
Surprise!  

In the case you were referring to, one specific distribution, Red Hat,
silently added the extended attributes feature to the filesystem
because it was needed by SELINUX.  This was actually a backwards
compatible feature, so that older 2.4 based distributions could
*mount* the filesystem.  Unfortunately e2fsck needs to be more
careful, and so the problem was that the older distribution's fsck
wasn't able to check the filesystem.  But this was actually Red Hat's
fault, in that they shouldn't have transparently added the extended
attribute feature without first asking the user's permission.   

Being able to forward upgrade to newer filesystem formats is a good
thing, and has a long history; users don't like to do a backup,
reformat, and restore pass if they can't help that.  Heck, Microsoft
Windows even has a way that they can upgrade a FAT filesystem to their
latest NTFSv5 filesystem using a userspace progam.  Providing such a
capability is not a bad thing, and in fact it is a good thing.  The
bad thing to do is to do the conversion without first asking the
user's permission (for example just as Windows XP does when you first
boot a preinstalled system on a laptop for the first time).

People seem to be arguing that just because an distribution installer
_could_ do a backwards incompatible upgrade without first asking
permission first, we must not provide the capability at all, and make
it be the case that the only way to upgrade from ext3 to ext4 is with
a backup, reformat, and restore.  Surely that doesn't make sense!

> This wasn't the kernel driver, this was merely the tools!  Surprise!
> We made no choice to use new features, and they were thrust upon us.
> This will happen to others.

I suspect that Red Hat has learned from that past experience, and
won't be making that mistake again, at least without explicitly
requesting the user's permission.  So how about we trust the
distributions to be a bit more careful this time around?

						- Ted

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC 0/13] extents and 48bit ext3
  2006-06-09 21:03                       ` Theodore Tso
@ 2006-06-09 21:24                         ` Joel Becker
  2006-06-09 21:36                           ` [Ext2-devel] " Chase Venters
  2006-06-09 21:51                           ` Theodore Tso
  0 siblings, 2 replies; 104+ messages in thread
From: Joel Becker @ 2006-06-09 21:24 UTC (permalink / raw)
  To: Theodore Tso, Jeff Garzik, Alex Tomas, Andrew Morton, ext2-devel,
	linux-kernel, Linus Torvalds, cmm, linux-fsdevel, Andreas Dilger

On Fri, Jun 09, 2006 at 05:03:19PM -0400, Theodore Tso wrote:
> This is going to happen regardless of whether we call the code base
> "ext3" or "ext4".  Anytime you make format changes (in this case, to
> support larger disk sizes) older kernels won't support it any more.
> Surprise!  

	Of course format changes break things.  But if you claim that
"X" and "Y" are the same thing, and they aren't, people won't see it
coming.

> wasn't able to check the filesystem.  But this was actually Red Hat's
> fault, in that they shouldn't have transparently added the extended
> attribute feature without first asking the user's permission.   

	Sure it was Red Hat's fault.  Knowing who to blame doesn't solve
the existing problem, though.  They never even put out e2fsck upgrades
for older distros, which would have solved the problem just as easily.

> Being able to forward upgrade to newer filesystem formats is a good
> thing, and has a long history; users don't like to do a backup,
> reformat, and restore pass if they can't help that.  Heck, Microsoft
> Windows even has a way that they can upgrade a FAT filesystem to their
> latest NTFSv5 filesystem using a userspace progam.  Providing such a
> capability is not a bad thing, and in fact it is a good thing.  The
> bad thing to do is to do the conversion without first asking the
> user's permission (for example just as Windows XP does when you first
> boot a preinstalled system on a laptop for the first time).

	This entire statement is true.  However, note that "FAT" becomes
"NTFSv5", and there is no expectation, implicit or explicit, that you
can use "FAT" to mount the changed volume.
	You can call the new filesystem ext4, and mount an old ext3 as
ext4, and guess what?  You're just as forward compatible, but now you've
explictly specified the lack of backwards compatibility.  You could even
provide a userspace tool just like in your example to switch an INCOMPAT
feature.

> People seem to be arguing that just because an distribution installer
> _could_ do a backwards incompatible upgrade without first asking
> permission first, we must not provide the capability at all, and make
> it be the case that the only way to upgrade from ext3 to ext4 is with
> a backup, reformat, and restore.  Surely that doesn't make sense!

	There is no reason you need a backup/restore cycle.  Mount it as
ext4, and forever forward its an ext4.  In the ext2->ext3 cycle, we
called it "tunefs -J".
 
> So how about we trust the distributions to be a bit more careful
> this time around?

	Haha, you're funny.
	Seriously, Ted, I personally have one concern here.  I don't
care much about the maintainability of one code base versus two.  Both
have advantages and problems.  I care a little that my "used to be
stable" ext3 code base might be destabilized, but I know that the ext2/3
team has been better than most at stable code transitions.
	What I do care is that "ext3" can no longer mount partition X.
That's gonna happen.  This thing still has the same name, but it is in
actuality something very different.  When ext2 could no longer mount a
journaled version of itself, we changed it to "ext3".
	Heck, forget the name, just make the breakage more explicit.  Do
it at mkfs/tunefs time.  "tunefs -extents" or "mkfs -t ext3 -extents".
A mount option assumes that you can do with or without it.  If you do it
once, you can mount the next time without it and stuff Just Works.  Even
htree follows this.  A clean unmount leaves a clean directory structure
that a non-htree driver can use.

Joel

-- 

"Not being known doesn't stop the truth from being true."
        - Richard Bach

Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker@oracle.com
Phone: (650) 506-8127

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [Ext2-devel] [RFC 0/13] extents and 48bit ext3
  2006-06-09 21:24                         ` Joel Becker
@ 2006-06-09 21:36                           ` Chase Venters
  2006-06-09 21:51                           ` Theodore Tso
  1 sibling, 0 replies; 104+ messages in thread
From: Chase Venters @ 2006-06-09 21:36 UTC (permalink / raw)
  To: Joel Becker
  Cc: Theodore Tso, Jeff Garzik, Alex Tomas, Andrew Morton, ext2-devel,
	linux-kernel, cmm, linux-fsdevel, Andreas Dilger

On Fri, 9 Jun 2006, Joel Becker wrote:

> 	Heck, forget the name, just make the breakage more explicit.  Do
> it at mkfs/tunefs time.  "tunefs -extents" or "mkfs -t ext3 -extents".
> A mount option assumes that you can do with or without it.  If you do it
> once, you can mount the next time without it and stuff Just Works.  Even
> htree follows this.  A clean unmount leaves a clean directory structure
> that a non-htree driver can use.

I suggested this somewhere back in the thread and it got no play. What's 
the problem with doing things this way? (Aside from it being a compromise 
that doesn't automatically result in a new ext4)

Of course, there are a few debates going on here. Only one of them is 
about compatibility.

>
> Joel
>

Cheers,
Chase

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC 0/13] extents and 48bit ext3
  2006-06-09 21:24                         ` Joel Becker
  2006-06-09 21:36                           ` [Ext2-devel] " Chase Venters
@ 2006-06-09 21:51                           ` Theodore Tso
  2006-06-09 22:07                             ` Joel Becker
  1 sibling, 1 reply; 104+ messages in thread
From: Theodore Tso @ 2006-06-09 21:51 UTC (permalink / raw)
  To: Jeff Garzik, Alex Tomas, Andrew Morton, ext2-devel, linux-kernel,
	Linus Torvalds, cmm, linux-fsdevel, Andreas Dilger

On Fri, Jun 09, 2006 at 02:24:10PM -0700, Joel Becker wrote:
> 	Heck, forget the name, just make the breakage more explicit.  Do
> it at mkfs/tunefs time.  "tunefs -extents" or "mkfs -t ext3 -extents".
> A mount option assumes that you can do with or without it.  If you do it
> once, you can mount the next time without it and stuff Just Works.  Even
> htree follows this.  A clean unmount leaves a clean directory structure
> that a non-htree driver can use.

Agreed; I've was never a fan of how we enabled extended attributes
using a mount option, as it clutters the /etc/fstab line, among other
things.  (I added the tune2fs -o feature so that default mount options
could be stored in the superblock to try to cover that design botch,
but the real answer is that extended attributes should never have been
done via a mount option, or at least not only as the right only thing
you had to do before the feature became enabled.)

The right approach is what we did with journaling (tune2fs -j or
tune2fs -O has_journal) and what we did with htree support (tune2fs -O
dir_index), to explicitly enable that feature, and that's certainly
what I will be pushing for.

						- Ted

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC 0/13] extents and 48bit ext3
  2006-06-09 21:51                           ` Theodore Tso
@ 2006-06-09 22:07                             ` Joel Becker
  2006-06-09 22:31                               ` [Ext2-devel] " Theodore Tso
  0 siblings, 1 reply; 104+ messages in thread
From: Joel Becker @ 2006-06-09 22:07 UTC (permalink / raw)
  To: Theodore Tso, Jeff Garzik, Alex Tomas, Andrew Morton, ext2-devel,
	linux-kernel, Linus Torvalds, cmm, linux-fsdevel, Andreas Dilger

On Fri, Jun 09, 2006 at 05:51:37PM -0400, Theodore Tso wrote:
> The right approach is what we did with journaling (tune2fs -j or
> tune2fs -O has_journal) and what we did with htree support (tune2fs -O
> dir_index), to explicitly enable that feature, and that's certainly
> what I will be pushing for.

	Excellent.  And now let's close the other side of compatibility.
The attribute problem we discussed with e2fsck has a simple solution:
exit cleanly when you don't understand a filesystem.
	If e2fsck finds an INCOMPAT flag it doesn't understand, it
didn't *fail* to fsck, it just plain doesn't understand the filesystem.
This should not, in any way, prevent bootup from continuing.  Later,
mount may succeed (if the kernel is new enough) or fail (if not), but my
system won't be completely unusable by surprise (assuming that / isn't
the affected filesystem).

Joel

-- 

Bram's Law:
	The easier a piece of software is to write, the worse it's
	implemented in practice.

Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker@oracle.com
Phone: (650) 506-8127

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [Ext2-devel] [RFC 0/13] extents and 48bit ext3
  2006-06-09 22:07                             ` Joel Becker
@ 2006-06-09 22:31                               ` Theodore Tso
  2006-06-09 22:47                                 ` Joel Becker
  0 siblings, 1 reply; 104+ messages in thread
From: Theodore Tso @ 2006-06-09 22:31 UTC (permalink / raw)
  To: Jeff Garzik, Alex Tomas, Andrew Morton, ext2-devel, linux-kernel,
	Linus Torvalds, cmm, linux-fsdevel, Andreas Dilger

On Fri, Jun 09, 2006 at 03:07:11PM -0700, Joel Becker wrote:
> 	Excellent.  And now let's close the other side of compatibility.
> The attribute problem we discussed with e2fsck has a simple solution:
> exit cleanly when you don't understand a filesystem.
> 	If e2fsck finds an INCOMPAT flag it doesn't understand, it
> didn't *fail* to fsck, it just plain doesn't understand the filesystem.
> This should not, in any way, prevent bootup from continuing.  Later,
> mount may succeed (if the kernel is new enough) or fail (if not), but my
> system won't be completely unusable by surprise (assuming that / isn't
> the affected filesystem).

The potential problem with this is that system administrator may never
realize that the filesystem is just getting silently skipped.  (And a
big fat warning printed by e2fsck doesn't help when distro's like
Ubuntu use a graphical boot sequence that hides warning messages
printed by e2fsck).

Is it really that hard to edit /etc/fstab so that the fsck pass is
skipped?  

I might be willing to make it be a configurable option in
/etc/e2fsck.conf, but it *is* dangerous to have e2fsck exit with
success without having actually checked the filesystem.

						- Ted

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC 0/13] extents and 48bit ext3
  2006-06-09 22:31                               ` [Ext2-devel] " Theodore Tso
@ 2006-06-09 22:47                                 ` Joel Becker
  2006-06-09 23:54                                   ` [Ext2-devel] " Theodore Tso
  0 siblings, 1 reply; 104+ messages in thread
From: Joel Becker @ 2006-06-09 22:47 UTC (permalink / raw)
  To: Theodore Tso, Jeff Garzik, Alex Tomas, Andrew Morton, ext2-devel,
	linux-kernel, Linus Torvalds, cmm, linux-fsdevel, Andreas Dilger

On Fri, Jun 09, 2006 at 06:31:29PM -0400, Theodore Tso wrote:
> The potential problem with this is that system administrator may never
> realize that the filesystem is just getting silently skipped.  (And a
> big fat warning printed by e2fsck doesn't help when distro's like
> Ubuntu use a graphical boot sequence that hides warning messages
> printed by e2fsck).

	Yeah, you're not the only one to point this out.

> Is it really that hard to edit /etc/fstab so that the fsck pass is
> skipped?  

	Depends on how close you are in proximity to the console, I
suspect.  Point is, something _broke_.
	Regardless of the name, clearly we have a _different_
filesystem.  With a clean unmount, a journaled ext3 is still a valid
ext2.  With a clean unmount, an extended-attribute ext3 is still a valid
plain ext3 and a valid ext2.  With a clean unmount, a dir_index ext3 is
still a valid plain ext3 and a valid ext2.  An extents ext3 is NEVER a
valid plain ext3 or ext2.
	What happens today if you have a filesystem in fstab that
has no fsck in /sbin (eg, we all pick the name 'ext4', it says 'ext4' in
fstab, but there is no /sbin/fsck.ext4)?  Does "fsck -a" skip the
partition, or halt and fail the boot?  If the latter, I suspect that the
only solution is "I hope you don't encounter this on remote machines ha
ha ha".  If it skips, we have yet another reason that using the same
name is a bad thing.

Joel

-- 

"Sometimes when reading Goethe I have the paralyzing suspicion
 that he is trying to be funny."
         - Guy Davenport

Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker@oracle.com
Phone: (650) 506-8127

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [Ext2-devel] [RFC 0/13] extents and 48bit ext3
  2006-06-09 22:47                                 ` Joel Becker
@ 2006-06-09 23:54                                   ` Theodore Tso
  0 siblings, 0 replies; 104+ messages in thread
From: Theodore Tso @ 2006-06-09 23:54 UTC (permalink / raw)
  To: Jeff Garzik, Alex Tomas, Andrew Morton, ext2-devel, linux-kernel,
	Linus Torvalds, cmm, linux-fsdevel, Andreas Dilger

On Fri, Jun 09, 2006 at 03:47:00PM -0700, Joel Becker wrote:
> 	What happens today if you have a filesystem in fstab that
> has no fsck in /sbin (eg, we all pick the name 'ext4', it says 'ext4' in
> fstab, but there is no /sbin/fsck.ext4)?  Does "fsck -a" skip the
> partition, or halt and fail the boot?  If the latter, I suspect that the
> only solution is "I hope you don't encounter this on remote machines ha
> ha ha".  

It will halt and fail the boot.

Of course, installing a kernel more recent on 2.6.14 or so a RHEL4
system when you have a SCSI controller such as MPT Fusion will also
cause the system to fail to boot unless you remember to compile it
directly into the kernel because of changes in semantics about whether
the SCSI probing happens before or after the module load completes ---
and the answer that has been given is "we don't care".  So these sorts
of traps have been around for people who are going back and forth
between the bleeding edge and distro systems, but I think we'd all
agree that this isn't necessarily the common case.

							- Ted

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [Ext2-devel] [RFC 0/13] extents and 48bit ext3
  2006-06-09 16:25             ` Linus Torvalds
  2006-06-09 16:48               ` Alex Tomas
@ 2006-06-09 16:54               ` Linus Torvalds
  2006-06-09 17:04                 ` Alex Tomas
  2006-06-09 18:10                 ` Andreas Dilger
  2006-06-09 17:12               ` Jeff Anderson-Lee
  2006-06-09 18:02               ` Andrew Morton
  3 siblings, 2 replies; 104+ messages in thread
From: Linus Torvalds @ 2006-06-09 16:54 UTC (permalink / raw)
  To: Alex Tomas
  Cc: Jeff Garzik, Andrew Morton, ext2-devel, linux-kernel, cmm,
	linux-fsdevel, Andreas Dilger

On Fri, 9 Jun 2006, Linus Torvalds wrote:
> 
> Just as an example: ext3 _sucks_ in many ways. It has huge inodes that 
> take up way too much space in memory.

Btw, I'm not kidding you on this one.

THE NUMBER ONE MEMORY USAGE ON A LOT OF LOADS IS EXT3 INODES IN MEMORY!

And you know what? 2TB files are totally uninteresting to 99.9999% of all 
people. Most people find it _much_ more interesting to have hundreds of 
thousands of _smaller_ files instead.

So do this:

	cat /proc/slabinfo | grep ext3

and be absolutely disgusted and horrified by the size of those inodes 
already, and ask yourself whether extending the block size to 48 bits will 
help or further hurt one of the biggest problems of ext3 right now?

(And yes, I realize that block numbers are just a small part of it. The 
"vfs_inode" is also a real problem - it's got _way_ too many large 
list-heads that explode on a 64-bit kernel, for example. Oh, well. My 
point is that things like this can make a very real issue _worse_ for all 
the people who don't care one whit about it)

		Linus

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC 0/13] extents and 48bit ext3
  2006-06-09 16:54               ` Linus Torvalds
@ 2006-06-09 17:04                 ` Alex Tomas
  2006-06-09 17:30                   ` [Ext2-devel] " Linus Torvalds
  2006-06-09 18:10                 ` Andreas Dilger
  1 sibling, 1 reply; 104+ messages in thread
From: Alex Tomas @ 2006-06-09 17:04 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andrew Morton, Jeff Garzik, ext2-devel, linux-kernel, cmm,
	linux-fsdevel, Alex Tomas, Andreas Dilger

oops :) I don't follow that well ... 

size of in-core inodes is a different problem.

thanks, Alex

>>>>> Linus Torvalds (LT) writes:

 LT> On Fri, 9 Jun 2006, Linus Torvalds wrote:
 >> 
 >> Just as an example: ext3 _sucks_ in many ways. It has huge inodes that 
 >> take up way too much space in memory.

 LT> Btw, I'm not kidding you on this one.

 LT> THE NUMBER ONE MEMORY USAGE ON A LOT OF LOADS IS EXT3 INODES IN MEMORY!

 LT> And you know what? 2TB files are totally uninteresting to 99.9999% of all 
 LT> people. Most people find it _much_ more interesting to have hundreds of 
 LT> thousands of _smaller_ files instead.

 LT> So do this:

 LT> 	cat /proc/slabinfo | grep ext3

 LT> and be absolutely disgusted and horrified by the size of those inodes 
 LT> already, and ask yourself whether extending the block size to 48 bits will 
 LT> help or further hurt one of the biggest problems of ext3 right now?

 LT> (And yes, I realize that block numbers are just a small part of it. The 
 LT> "vfs_inode" is also a real problem - it's got _way_ too many large 
 LT> list-heads that explode on a 64-bit kernel, for example. Oh, well. My 
 LT> point is that things like this can make a very real issue _worse_ for all 
 LT> the people who don't care one whit about it)

 LT> 		Linus

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [Ext2-devel] [RFC 0/13] extents and 48bit ext3
  2006-06-09 17:04                 ` Alex Tomas
@ 2006-06-09 17:30                   ` Linus Torvalds
  2006-06-09 17:41                     ` Matthew Wilcox
  0 siblings, 1 reply; 104+ messages in thread
From: Linus Torvalds @ 2006-06-09 17:30 UTC (permalink / raw)
  To: Alex Tomas
  Cc: Jeff Garzik, Andrew Morton, ext2-devel, linux-kernel, cmm,
	linux-fsdevel, Andreas Dilger

On Fri, 9 Jun 2006, Alex Tomas wrote:
> 
> oops :) I don't follow that well ... 
> 
> size of in-core inodes is a different problem.

Not really. It's really the same problem: adding features has a real cost.

And the cost is higher if you don't add them in a way that is statically 
separable.

So I'm not trying to make the in-core inode size be "the thing" to 
concentrate on. And I'm not saying that extents is inherently "the thing" 
that makes it sane to split up development. That time might have been a 
few years ago, or it might be in the future.

So don't get me wrong. I'm (a) generally supporting Jeff in that I think 
it makes sense to split projects off occasionally, and maybe even plan on 
hopefully make the original project be deleted in the long run (it does 
actually happen, although it is fairly rare). And (b) trying to show the 
costs.

For me, the biggest cost tends to actually be support. A stable filesystem 
that is used by thousands and thousands of people and that isn't actually 
developed outside of just maintaining it IS A REALLY GOOD THING TO HAVE. 

And I'm not saying that just because it's a filesystem, and people get 
upset if they lose data. No, I'm saying it because from a maintenance 
standpoint, such a filesystem has almost zero cost.

So from a maintenance stanpoint, it's actually a _lot_ more useful to me 
(and probably to a lot of other people) if development is done as its own 
project, and is merged as its own sub-project. When problems happen, it's 
fairly obvious what they are, and it's very much a case of all the people 
involved having made that choice ("Hey, you knew it wasn't as stable, but 
you wanted it for your special needs").

As an additional bonus, it tends to help find patterns in bug-reports 
("ahh, everyone involved is running ext4"). So not only does it not affect 
people who don't want to be affected, it also helps _pinpoint_ where 
problems are when they do happen.

Also, if it turns out that the stabilization thing worked well, and after 
a few years the _new_ code hasn't gotten any changes, and there are no 
other real downsides either, they can actually be merged later on too. 

That's what we're seeing in the 64-bit architecture support on both s390 
and powerpc (and maybe even x86, eventually? Possibly not, but who 
knows..). But that's a separate issue.

			Linus

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC 0/13] extents and 48bit ext3
  2006-06-09 17:30                   ` [Ext2-devel] " Linus Torvalds
@ 2006-06-09 17:41                     ` Matthew Wilcox
  2006-06-09 18:04                       ` [Ext2-devel] " Linus Torvalds
  2006-06-09 18:17                       ` Michael Poole
  0 siblings, 2 replies; 104+ messages in thread
From: Matthew Wilcox @ 2006-06-09 17:41 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andrew Morton, Jeff Garzik, ext2-devel, linux-kernel, cmm,
	linux-fsdevel, Alex Tomas, Andreas Dilger

On Fri, Jun 09, 2006 at 10:30:06AM -0700, Linus Torvalds wrote:
> And I'm not saying that just because it's a filesystem, and people get 
> upset if they lose data. No, I'm saying it because from a maintenance 
> standpoint, such a filesystem has almost zero cost.

One of the costs (and I'm not disagreeing with your main point;
I think forking ext3 to ext4 at this point is reasonable), is that
bugfixes applied to one don't necessarily get applied to the other.
I found some recently between ext2 and ext3, and submitted those, but I
only audited one file.  There's lots more to look at and I just haven't
found the time recently.  Going to three variations is a lot more work
for auditing, and it might be worth splitting some bits which genuinely
are the same into common code.

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [Ext2-devel] [RFC 0/13] extents and 48bit ext3
  2006-06-09 17:41                     ` Matthew Wilcox
@ 2006-06-09 18:04                       ` Linus Torvalds
  2006-06-09 18:17                       ` Michael Poole
  1 sibling, 0 replies; 104+ messages in thread
From: Linus Torvalds @ 2006-06-09 18:04 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Alex Tomas, Jeff Garzik, Andrew Morton, ext2-devel, linux-kernel,
	cmm, linux-fsdevel, Andreas Dilger

On Fri, 9 Jun 2006, Matthew Wilcox wrote:
> 
> One of the costs (and I'm not disagreeing with your main point;
> I think forking ext3 to ext4 at this point is reasonable), is that
> bugfixes applied to one don't necessarily get applied to the other.

I agree. However, that tends to be less of an issue of you fork off a 
stable base (which isn't always the case). Forking off something that is 
being stil actively developed is a different matter entirely. I don't 
think ext3 is in that situation, really.

Also, one of the issues is when there are big VFS layer changes, which 
affect all filesystems. Then, a lot of people will think that it's easier 
to fix up one unified filesystem than it is to fix up five separate ones, 
and the fact is, that's often _not_ the case.

The unified filesystem potentially has so much crud and crap and other 
issues that it ends up being much more work to understand and fix it up 
than it would have been to do the same thing for five different 
filesystems that didn't play a lot of games and have complex

  "if this flag is set, do this code, otherwise do that code, and this 
   whole directory reading code btw has a static CONFIG_EXT3_INDEX thing, 
   so you won't even know if you caught all the interface changes when you 
   get a clean compile"

So I'm not a huge believer in "shared code is good code". I believe shared 
code is good only if it has no conditionals.

Ie the VFS-layer kind of code that acts the SAME for everybody is the good 
kind of sharing. The kind where you call into different routines that will 
do different things depending on a flag (which may not even be obvious to 
the caller) is usually the _bad_ kind of sharing, because that's the kind 
of code that ends up working for one user and not working for another, and 
trying to make it work for both may be fundamentally hard.

The

	if (sb->option.extent) 
		.. do one thing ..
	else
		.. do another ..

kind of thing is exactly what leads to problems later. Even if it allows 
sharing of 90% of the code (the caller of the function), it leads to 
problems exactly because of things that end up not quite working because 
people only tested one code-path, and it broke the other case in some 
really subtle way.

		Linus

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [Ext2-devel] [RFC 0/13] extents and 48bit ext3
  2006-06-09 17:41                     ` Matthew Wilcox
  2006-06-09 18:04                       ` [Ext2-devel] " Linus Torvalds
@ 2006-06-09 18:17                       ` Michael Poole
  1 sibling, 0 replies; 104+ messages in thread
From: Michael Poole @ 2006-06-09 18:17 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Linus Torvalds, Alex Tomas, Jeff Garzik, Andrew Morton,
	ext2-devel, linux-kernel, cmm, linux-fsdevel, Andreas Dilger

Matthew Wilcox writes:

> On Fri, Jun 09, 2006 at 10:30:06AM -0700, Linus Torvalds wrote:
> > And I'm not saying that just because it's a filesystem, and people get 
> > upset if they lose data. No, I'm saying it because from a maintenance 
> > standpoint, such a filesystem has almost zero cost.
> 
> One of the costs (and I'm not disagreeing with your main point;
> I think forking ext3 to ext4 at this point is reasonable), is that
> bugfixes applied to one don't necessarily get applied to the other.
> I found some recently between ext2 and ext3, and submitted those, but I
> only audited one file.  There's lots more to look at and I just haven't
> found the time recently.  Going to three variations is a lot more work
> for auditing, and it might be worth splitting some bits which genuinely
> are the same into common code.

If you want more details on this kind of issue, look at CP-Miner.  A
paper published earlier this year in IEEE TSE[1] reports that that
tool found 421 cut-and-paste-related possible bugs in Linux, of which
49 were real bugs, 249 were false positives, and 123 could not be
proven either true or false positives.

[1]- http://doi.ieeecomputersociety.org/10.1109/TSE.2006.28

Michael Poole

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [Ext2-devel] [RFC 0/13] extents and 48bit ext3
  2006-06-09 16:54               ` Linus Torvalds
  2006-06-09 17:04                 ` Alex Tomas
@ 2006-06-09 18:10                 ` Andreas Dilger
  2006-06-09 18:22                   ` Linus Torvalds
                                     ` (2 more replies)
  1 sibling, 3 replies; 104+ messages in thread
From: Andreas Dilger @ 2006-06-09 18:10 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Alex Tomas, Jeff Garzik, Andrew Morton, ext2-devel, linux-kernel,
	cmm, linux-fsdevel

On Jun 09, 2006  09:25  -700, Linus Torvalds wrote:
> So two separate filesystems are _less_ to maintain than one big one. Even
> if there's a lot of code that -could- be shared.

That is true if people are willing to maintain both trees.  I think that
even with the current ext2/ext3 split there are continually fixes that are
missing from one filesystem or another.

> Just as an example: ext3 _sucks_ in many ways. It has huge inodes that
> take up way too much space in memory. It has absolutely disgusting code to
> handle directory reading and writing (buffer heads! In 2006!).

My point exactly!  The ext2 directory code was moved from buffer heads to
page cache by Al after ext3 was forked and the code was never fixed in ext3.

I don't see this getting any better if there is an ext4 filesystem and all
of the ext3 developers are only interested in maintaining ext4.  Look at
reiserfs - it is completely abandoned by Hans in favour of reiser4 (the
entry in MAINTAINERS notwithstanding) except for Chris Mason at SuSE.

Having a single codebase for everyone means that it is continually maintained
and users of ext3 aren't left out in the cold.

On Jun 09, 2006  09:54 -0700, Linus Torvalds wrote:
> Btw, I'm not kidding you on this one.
> 
> THE NUMBER ONE MEMORY USAGE ON A LOT OF LOADS IS EXT3 INODES IN MEMORY!

Do you think that would be any different with a new filesystem?

> And you know what? 2TB files are totally uninteresting to 99.9999% of all 
> people. Most people find it _much_ more interesting to have hundreds of 
> thousands of _smaller_ files instead.
> 
> So do this:
> 
> 	cat /proc/slabinfo | grep ext3

# head -2 /proc/slabinfo
slabinfo - version: 2.1
name       <active_objs> <num_objs> <objsize> <objperslab>

# grep ext2 /proc/slabinfo
ext2_inode_cache       0          0       572            7
ext2_xattr             0          0        48           81

# grep ext3 /proc/slabinfo

ext3_inode_cache   30207      41418       616            6
ext3_xattr             0          0        48           81

# grep xfs /proc/slabinfo
xfs_ili             2558       2576       140           28
xfs_inode           2558       2565       448            9

# grep jfs /proc/slabinfo
jfs_ip                 0          0      1048            3

So, the ext3 inode could grow another ~50 bytes without changing the
slab allocation size ;-), and in fact other filesystem aren't noticably
different.

> and be absolutely disgusted and horrified by the size of those inodes 
> already, and ask yourself whether extending the block size to 48 bits will 
> help or further hurt one of the biggest problems of ext3 right now?

This is then the biggest problem of all filesystems.

> (And yes, I realize that block numbers are just a small part of it. The 
> "vfs_inode" is also a real problem - it's got _way_ too many large 
> list-heads that explode on a 64-bit kernel, for example. Oh, well.

On a 32-bit system the vfs_inode is more than half of the size of the ext3
inode, it is worse on 64-bit systems.

> My point is that things like this can make a very real issue _worse_ for all 
> the people who don't care one whit about it)

The current group of changes will be a no-op if CONFIG_LBD isn't enabled,
and I think I argued fairly strongly to also have a CONFIG_ flag to allow
larger than 2TB file support only for those users that want it.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [Ext2-devel] [RFC 0/13] extents and 48bit ext3
  2006-06-09 18:10                 ` Andreas Dilger
@ 2006-06-09 18:22                   ` Linus Torvalds
  2006-06-09 18:30                     ` Alex Tomas
  2006-06-09 18:40                   ` Jeff Garzik
  2006-06-09 18:41                   ` Jeff Garzik
  2 siblings, 1 reply; 104+ messages in thread
From: Linus Torvalds @ 2006-06-09 18:22 UTC (permalink / raw)
  To: Andreas Dilger
  Cc: Alex Tomas, Jeff Garzik, Andrew Morton, ext2-devel, linux-kernel,
	cmm, linux-fsdevel

On Fri, 9 Jun 2006, Andreas Dilger wrote:
> missing from one filesystem or another.
> 
> > Just as an example: ext3 _sucks_ in many ways. It has huge inodes that
> > take up way too much space in memory. It has absolutely disgusting code to
> > handle directory reading and writing (buffer heads! In 2006!).
> 
> My point exactly!  The ext2 directory code was moved from buffer heads to
> page cache by Al after ext3 was forked and the code was never fixed in ext3.

The code was never fixed in ext3, because ext3 is a pig in that area.

You misunderstand how this worked.

The reason ext2 got fixed was that ext2 was _simple_. It got fixed 
_despite_ the fact that it's not all that widely used any more, and not 
considered a really important filesystem. It got fixed because it wasn't 
too bad. It doesn't have all the crud that makes it a much more involved 
thing to do for ext3.

So if the ext2/3 split hadn't happened, _neither_ of them would be fixed.

See?

My point is, maintaining two different pieces is SIMPLER.

Even if that simplicity sometimes ends up meaning "not maintaining the 
other one".

So being out of sync is not a problem. It's a _feature_. 

> On Jun 09, 2006  09:54 -0700, Linus Torvalds wrote:
> > Btw, I'm not kidding you on this one.
> > 
> > THE NUMBER ONE MEMORY USAGE ON A LOT OF LOADS IS EXT3 INODES IN MEMORY!
> 
> Do you think that would be any different with a new filesystem?

It would be bigger, if you made ext3 do 48-bit block numbers.

See? ext3 would become strictly _worse_ for the majority of users, who 
wouldn't get any advantage. That's my point.

> So, the ext3 inode could grow another ~50 bytes without changing the
> slab allocation size ;-), and in fact other filesystem aren't noticably
> different.

Yes, I already pointed out that the biggest part of it was actually the 
vfs_inode thing.

And btw, growing more than 50 bytes is exactly what it would do. Go look.

> This is then the biggest problem of all filesystems.

Yeah, under many loads it is. We do really badly with lots of metadata in 
memory. Why do you think people have historically complained about things 
like the updatedb flushing their disk cache?

If you look at disk access patterns, one of _the_ biggest problems is not 
in readign individual files. It's in inode atime updates and the other 
"stupid crap" stuff.

> On a 32-bit system the vfs_inode is more than half of the size of the ext3
> inode, it is worse on 64-bit systems.

..which I pointed out, and doesn't change my point one _whit_. 

The fact that the block numbers aren't the _only_ problem doesn't suddenly 
mean they are problem-free, does it?

			Linus

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC 0/13] extents and 48bit ext3
  2006-06-09 18:22                   ` Linus Torvalds
@ 2006-06-09 18:30                     ` Alex Tomas
  2006-06-09 18:38                       ` Linus Torvalds
                                         ` (2 more replies)
  0 siblings, 3 replies; 104+ messages in thread
From: Alex Tomas @ 2006-06-09 18:30 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andrew Morton, Jeff Garzik, ext2-devel, linux-kernel, cmm,
	linux-fsdevel, Alex Tomas, Andreas Dilger

>>>>> Linus Torvalds (LT) writes:
 LT> My point is, maintaining two different pieces is SIMPLER.

"different" is a key word here. why should we copy most of ext3 code
into ext4?

 LT> It would be bigger, if you made ext3 do 48-bit block numbers.

nope, we re-use existing i_data w/o any changes. yes, we've made
inode a bit larger to cache last found extent. this improves
performance in some workloads noticable though.

 LT> See? ext3 would become strictly _worse_ for the majority of users, who 
 LT> wouldn't get any advantage. That's my point.

would "#if CONFIG_EXT3_EXTENTS" be a good solution then?

thanks. Alex

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC 0/13] extents and 48bit ext3
  2006-06-09 18:30                     ` Alex Tomas
@ 2006-06-09 18:38                       ` Linus Torvalds
  2006-06-09 18:50                         ` [Ext2-devel] " Chase Venters
  2006-06-09 18:43                       ` Jeff Garzik
  2006-06-09 18:50                       ` Diego Calleja
  2 siblings, 1 reply; 104+ messages in thread
From: Linus Torvalds @ 2006-06-09 18:38 UTC (permalink / raw)
  To: Alex Tomas
  Cc: Andrew Morton, Jeff Garzik, ext2-devel, linux-kernel, cmm,
	linux-fsdevel, Andreas Dilger



On Fri, 9 Jun 2006, Alex Tomas wrote:
> 
> would "#if CONFIG_EXT3_EXTENTS" be a good solution then?

Let's put it this way:
 - have you had _any_ valid argument at all against "ext4"?

Think about it. Honestly. Tell me anything that doesn't work?

		Linus

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [Ext2-devel] [RFC 0/13] extents and 48bit ext3
  2006-06-09 18:38                       ` Linus Torvalds
@ 2006-06-09 18:50                         ` Chase Venters
  2006-06-09 19:00                           ` Chase Venters
                                             ` (2 more replies)
  0 siblings, 3 replies; 104+ messages in thread
From: Chase Venters @ 2006-06-09 18:50 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Alex Tomas, Andreas Dilger, Jeff Garzik, Andrew Morton,
	ext2-devel, linux-kernel, cmm, linux-fsdevel

On Fri, 9 Jun 2006, Linus Torvalds wrote:

>
>
> On Fri, 9 Jun 2006, Alex Tomas wrote:
>>
>> would "#if CONFIG_EXT3_EXTENTS" be a good solution then?
>
> Let's put it this way:
> - have you had _any_ valid argument at all against "ext4"?
>
> Think about it. Honestly. Tell me anything that doesn't work?

It's about bundling. It's about being able to take your 3-year old 
dependable car and make it faster by bolting on new manifolds and 
turbochargers, rather than waiting a year for the manufacturer to release 
a totally new model (and buying totally new cars often means you're part 
of the manufacturer's debugging group, so be prepared to have things fail 
which require warranty work).

Now, granted, I really do agree with you about the whole code sharing 
thing. A fresh start is often just what you need. I'm just questioning if 
it wouldn't be better to do this fresh start immediately after going 
48-bit, rather than before. That way, existing users that want that extra 
umph can have it today.

> 		Linus

Cheers,
Chase

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [Ext2-devel] [RFC 0/13] extents and 48bit ext3
  2006-06-09 18:50                         ` [Ext2-devel] " Chase Venters
@ 2006-06-09 19:00                           ` Chase Venters
  2006-06-10 13:33                             ` Adrian Bunk
  2006-06-09 19:01                           ` Jeff Garzik
  2006-06-09 19:21                           ` Alan Cox
  2 siblings, 1 reply; 104+ messages in thread
From: Chase Venters @ 2006-06-09 19:00 UTC (permalink / raw)
  To: Chase Venters
  Cc: Linus Torvalds, Alex Tomas, Andreas Dilger, Jeff Garzik,
	Andrew Morton, ext2-devel, linux-kernel, cmm, linux-fsdevel

On Fri, 9 Jun 2006, Chase Venters wrote:

> On Fri, 9 Jun 2006, Linus Torvalds wrote:
>
>> 
>>
>>  On Fri, 9 Jun 2006, Alex Tomas wrote:
>> > 
>> >  would "#if CONFIG_EXT3_EXTENTS" be a good solution then?
>>
>>  Let's put it this way:
>>  - have you had _any_ valid argument at all against "ext4"?
>>
>>  Think about it. Honestly. Tell me anything that doesn't work?
>
> Now, granted, I really do agree with you about the whole code sharing thing. 
> A fresh start is often just what you need. I'm just questioning if it 
> wouldn't be better to do this fresh start immediately after going 48-bit, 
> rather than before. That way, existing users that want that extra umph can 
> have it today.
>

Let me clarify that I don't have a final answer or opinion for whether or 
not 48-bit belongs in ext3 or ext4. But I'm trying to illustrate that it's an 
important question to raise.

In Group A we have some number of users that must have 48-bit support by 
Date B. 48-bit support could be available in ext3 by Date A, before Date 
B. It could also be available in ext4 by Date X, along with a handful of 
other features.

Is Date X before Date B? If it's not, is it worth telling Group A to 
suffer for a while, or asking them to use ext4 before it's ready? These 
are the questions I'd have to know the answers to if I were the one 
casting a final decision.

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [Ext2-devel] [RFC 0/13] extents and 48bit ext3
  2006-06-09 19:00                           ` Chase Venters
@ 2006-06-10 13:33                             ` Adrian Bunk
  0 siblings, 0 replies; 104+ messages in thread
From: Adrian Bunk @ 2006-06-10 13:33 UTC (permalink / raw)
  To: Chase Venters
  Cc: Linus Torvalds, Alex Tomas, Andreas Dilger, Jeff Garzik,
	Andrew Morton, ext2-devel, linux-kernel, cmm, linux-fsdevel

On Fri, Jun 09, 2006 at 02:00:15PM -0500, Chase Venters wrote:
> On Fri, 9 Jun 2006, Chase Venters wrote:
> 
> >On Fri, 9 Jun 2006, Linus Torvalds wrote:
> >
> >>
> >>
> >> On Fri, 9 Jun 2006, Alex Tomas wrote:
> >>> 
> >>>  would "#if CONFIG_EXT3_EXTENTS" be a good solution then?
> >>
> >> Let's put it this way:
> >> - have you had _any_ valid argument at all against "ext4"?
> >>
> >> Think about it. Honestly. Tell me anything that doesn't work?
> >
> >Now, granted, I really do agree with you about the whole code sharing 
> >thing. A fresh start is often just what you need. I'm just questioning if 
> >it wouldn't be better to do this fresh start immediately after going 
> >48-bit, rather than before. That way, existing users that want that extra 
> >umph can have it today.
> >
> 
> Let me clarify that I don't have a final answer or opinion for whether or 
> not 48-bit belongs in ext3 or ext4. But I'm trying to illustrate that it's 
> an important question to raise.
> 
> In Group A we have some number of users that must have 48-bit support by 
> Date B. 48-bit support could be available in ext3 by Date A, before Date 
> B. It could also be available in ext4 by Date X, along with a handful of 
> other features.
> 
> Is Date X before Date B? If it's not, is it worth telling Group A to 
> suffer for a while, or asking them to use ext4 before it's ready? These 
> are the questions I'd have to know the answers to if I were the one 
> casting a final decision.

There are many points mentioned in this discussion like:
- possibility of regressions for existing users
- time until the new code is actually stable and well-tested
- long-term maintainability

The faster availability is a point, but it's only one amongst many 
points.

And it's not that we are talking about a feature not yet available in 
Linux at all. Instead of suffering, couldn't the few people in urgent 
need of 48-bit support use JFS or XFS?

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC 0/13] extents and 48bit ext3
  2006-06-09 18:50                         ` [Ext2-devel] " Chase Venters
  2006-06-09 19:00                           ` Chase Venters
@ 2006-06-09 19:01                           ` Jeff Garzik
  2006-06-10 19:27                             ` Kyle Moffett
  2006-06-09 19:21                           ` Alan Cox
  2 siblings, 1 reply; 104+ messages in thread
From: Jeff Garzik @ 2006-06-09 19:01 UTC (permalink / raw)
  To: Chase Venters
  Cc: Andrew Morton, ext2-devel, linux-kernel, Linus Torvalds, cmm,
	linux-fsdevel, Alex Tomas, Andreas Dilger

Chase Venters wrote:
> Now, granted, I really do agree with you about the whole code sharing 
> thing. A fresh start is often just what you need. I'm just questioning 
> if it wouldn't be better to do this fresh start immediately after going 
> 48-bit, rather than before. That way, existing users that want that 
> extra umph can have it today.

Then you continue to crap up the code with

	if (48bit)
		...
	else
		...

etc.

The proper way to do this is "cp -a ext3 ext4" (excluding JBD as Andrew 
mentioned), and then let evolution take its course.

"Evolution" means the standard Linux developement -- patch the kernel, 
patch e4fsprogs, test, lather rinse repeat.  The best development 
platform for new features is one that _works_, and keeps working.

	Jeff

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC 0/13] extents and 48bit ext3
  2006-06-09 19:01                           ` Jeff Garzik
@ 2006-06-10 19:27                             ` Kyle Moffett
  2006-06-10 19:44                               ` Linus Torvalds
  0 siblings, 1 reply; 104+ messages in thread
From: Kyle Moffett @ 2006-06-10 19:27 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Andrew Morton, ext2-devel, linux-kernel, Chase Venters,
	Linus Torvalds, cmm, linux-fsdevel, Alex Tomas, Andreas Dilger

On Jun 9, 2006, at 15:01:20, Jeff Garzik wrote:
> Chase Venters wrote:
>> Now, granted, I really do agree with you about the whole code  
>> sharing thing. A fresh start is often just what you need. I'm just  
>> questioning if it wouldn't be better to do this fresh start  
>> immediately after going 48-bit, rather than before. That way,  
>> existing users that want that extra umph can have it today.
>
> Then you continue to crap up the code with
>
> 	if (48bit)
> 		...
> 	else
> 		...
>
> etc.
>
> The proper way to do this is "cp -a ext3 ext4" (excluding JBD as  
> Andrew mentioned), and then let evolution take its course.

Why not: "extX_ops.do_something_useful();", then have fs/ext/ext 
{2,3,4}_ops.c which implement those various operations just like we  
do for the Virtual Filesystem Switch?  Much as there are  
commonalities between all filesystems that get moved into the VFS;  
perhaps we should have a Virtual Ext Filesystem Switch (VEFS?  
VextFS?) which abstracts out the commonalities between the evolving  
ext{2,3} code and data format?  Such code would also provide a  
library of common routines which could be used to implement other  
specialized filesystems in the future.  Imagine a cluster-extfs which  
reuses some of the core extXfs code despite changing the on-disk  
format considerably!

Cheers,
Kyle Moffett

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC 0/13] extents and 48bit ext3
  2006-06-10 19:27                             ` Kyle Moffett
@ 2006-06-10 19:44                               ` Linus Torvalds
  2006-06-10 20:02                                 ` [Ext2-devel] " Linus Torvalds
  0 siblings, 1 reply; 104+ messages in thread
From: Linus Torvalds @ 2006-06-10 19:44 UTC (permalink / raw)
  To: Kyle Moffett
  Cc: Andrew Morton, Jeff Garzik, ext2-devel, linux-kernel,
	Chase Venters, cmm, linux-fsdevel, Alex Tomas, Andreas Dilger

On Sat, 10 Jun 2006, Kyle Moffett wrote:
> 
> Why not: "extX_ops.do_something_useful();", then have fs/ext/ext{2,3,4}_ops.c

I think that kind of setup is hugely preferable to conditionals in the 
code, if only because it tends to force people to do the abstractions 
right, and make the code sequences independent.

I just don't think it's necessarily very realistic - it's _hard_ to 
refactor code well. It also doesn't buy you hardly anything at all, since 
the people who are interested in ext2 are usually not very interested in 
sharing code with ext3. The filesystems simply aren't that similar, apart 
from the layout. 

ext2 is half the size of ext3, and that's ignoring JBD entirely.

That constant growth, btw, is one reason why splitting off legacy 
filesystems is often a good idea. What do you want to bet that the 2000+ 
line difference RIGHT NOW in ext3/ext4 will grow in the future? Splitting 
things off means that people who don't care about the new features can 
just stay with a stable base and also avoid the bloat. Exactly the way you 
can stay with ext2 on an old machine, and avoid the bloat of ext3.

There's also nothign that says that legacy filesystems cannot be 
simplified. For example, it's perfectly realistic to say that ext3 (as a 
legacy filesystem) doesn't support resizing, and simply ripping that part 
out of it. The people who don't want the bloat will be happy. The people 
who want the feature can move to ext4.

See? Splitting development is what allows you to make choices that you 
simply otherwise don't _have_. 

			Linus

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [Ext2-devel] [RFC 0/13] extents and 48bit ext3
  2006-06-10 19:44                               ` Linus Torvalds
@ 2006-06-10 20:02                                 ` Linus Torvalds
  0 siblings, 0 replies; 104+ messages in thread
From: Linus Torvalds @ 2006-06-10 20:02 UTC (permalink / raw)
  To: Kyle Moffett
  Cc: Jeff Garzik, Chase Venters, Alex Tomas, Andreas Dilger,
	Andrew Morton, ext2-devel, linux-kernel, cmm, linux-fsdevel

On Sat, 10 Jun 2006, Linus Torvalds wrote:
> 
> ext2 is half the size of ext3, and that's ignoring JBD entirely.

Btw, let me say again that I'm fairly neutral on any particular individual 
feature (ie the 48-bit thing doesn't actually move me all that much in 
itself), but that from a maintenance standpoint, I think splitting off 
filesystems and drivers has been a _huge_ success.

Starting from scratch - even if you literally start from the same 
code-base - and allowing the old functionality to remain undisturbed is 
just a very nice model. Yeah, yeah, it has some diskspace cost (although 
at least from a git perspective, even that isn't really true), but we've 
seen both in drivers and in filesystems how splitting things up has been a 
great thing to do.

Sometimes it's a great thing just because five years later, it turns out 
that nobody even uses the legacy thing, and you decide to at that point 
just remove the driver (or filesystem, but so far it's never been the 
case for filesystems even if smbfs is a potential victim of this in the 
not _too_ distant future), because the new version simply does everything 
better.

And that's _not_ a failure of the model. It's a success too. But so is the 
above commentary on ext2, when the "old driver/filesystem is still used 
and maintained by odd people". It's just two different possible outcomes 
of the decision to do development separately from an older user base.

And again, I'd like to stress the _user_base_ over the _code_base_. In 
many ways, that's the much more important split. I suspect Jeff has seen 
this in drivers, where a lot of users simply do not want to have a new 
driver, because it does some huge fundamental improvement for new users 
but doesn't work for old ethernet cards, for example, because it missed 
some old use case depended on a legacy feature that just doesn't fit well 
into the new (and obviously improved) world-view.

So we've often seen a driver that _could_ have handled different versions 
of the same card/chip split into an "old" and a "new" driver, and on the 
whole it has always been positive - even if eventually the old driver just 
becomes irrelevant for one reason or another.

Duplication isn't actually bad. It's what often allows experimentation, 
and streamlining. In drivers, for example, duplication is _often_ done as 
part of simply dropping support for old cards in the new version, but also 
by dropping and simplifying the old driver that now has a much clearer 
"raison d'etre", aka "user base".

Which gets me back to the whole "'user base' matters more than 'code 
base'" argument, because it's literally the user base that determines 
development (or lack of it - non-development is often the big reason for a 
user base, as anybody who works for a distribution maintainer should know 
intimately).

			Linus

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC 0/13] extents and 48bit ext3
  2006-06-09 18:50                         ` [Ext2-devel] " Chase Venters
  2006-06-09 19:00                           ` Chase Venters
  2006-06-09 19:01                           ` Jeff Garzik
@ 2006-06-09 19:21                           ` Alan Cox
  2006-06-09 19:13                             ` [Ext2-devel] " Chase Venters
  2006-06-09 19:24                             ` Alex Tomas
  2 siblings, 2 replies; 104+ messages in thread
From: Alan Cox @ 2006-06-09 19:21 UTC (permalink / raw)
  To: Chase Venters
  Cc: Andrew Morton, Jeff Garzik, ext2-devel, linux-kernel,
	Linus Torvalds, cmm, linux-fsdevel, Alex Tomas, Andreas Dilger

Ar Gwe, 2006-06-09 am 13:50 -0500, ysgrifennodd Chase Venters:
> It's about bundling. It's about being able to take your 3-year old 
> dependable car and make it faster by bolting on new manifolds and 
> turbochargers, rather than waiting a year for the manufacturer to release 
> a totally new model

Unfortunately in the software case if you want it in the base kernel you
are bolting new manifolds on everyones car at once, and someone is going
to have an engine explode as a result.

Ext3 already has enough back compatiblity that you can replace the
engine with a horse, we don't need any more in it thank you.

Alan

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [Ext2-devel] [RFC 0/13] extents and 48bit ext3
  2006-06-09 19:21                           ` Alan Cox
@ 2006-06-09 19:13                             ` Chase Venters
  2006-06-09 19:24                             ` Alex Tomas
  1 sibling, 0 replies; 104+ messages in thread
From: Chase Venters @ 2006-06-09 19:13 UTC (permalink / raw)
  To: Alan Cox
  Cc: Chase Venters, Linus Torvalds, Alex Tomas, Andreas Dilger,
	Jeff Garzik, Andrew Morton, ext2-devel, linux-kernel, cmm,
	linux-fsdevel

On Fri, 9 Jun 2006, Alan Cox wrote:

> Ar Gwe, 2006-06-09 am 13:50 -0500, ysgrifennodd Chase Venters:
>> It's about bundling. It's about being able to take your 3-year old
>> dependable car and make it faster by bolting on new manifolds and
>> turbochargers, rather than waiting a year for the manufacturer to release
>> a totally new model
>
> Unfortunately in the software case if you want it in the base kernel you
> are bolting new manifolds on everyones car at once, and someone is going
> to have an engine explode as a result.

Someone _could_ have an engine explode... it's perfectly possible though 
that a well-tested 48-bit patch wouldn't cause anyone's ext3 to explode. 
(After all, the vehicle analogy breaks down here - software doesn't get 
worn out from being run at redline for too many miles.)

> Ext3 already has enough back compatiblity that you can replace the
> engine with a horse, we don't need any more in it thank you.

But just what are the costs at calling it quits now? Are we going to deny 
users something they need?

>
> Alan
>

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC 0/13] extents and 48bit ext3
  2006-06-09 19:21                           ` Alan Cox
  2006-06-09 19:13                             ` [Ext2-devel] " Chase Venters
@ 2006-06-09 19:24                             ` Alex Tomas
  2006-06-09 19:25                               ` Jeff Garzik
  1 sibling, 1 reply; 104+ messages in thread
From: Alex Tomas @ 2006-06-09 19:24 UTC (permalink / raw)
  To: Alan Cox
  Cc: Andrew Morton, Jeff Garzik, ext2-devel, linux-kernel,
	Chase Venters, Linus Torvalds, cmm, linux-fsdevel, Alex Tomas,
	Andreas Dilger

>>>>> Alan Cox (AC) writes:

 AC> Unfortunately in the software case if you want it in the base kernel you
 AC> are bolting new manifolds on everyones car at once, and someone is going
 AC> to have an engine explode as a result.

please, don't forget you need to enable it by mount option.

thanks, Alex

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC 0/13] extents and 48bit ext3
  2006-06-09 19:24                             ` Alex Tomas
@ 2006-06-09 19:25                               ` Jeff Garzik
  2006-06-09 19:35                                 ` Alex Tomas
  0 siblings, 1 reply; 104+ messages in thread
From: Jeff Garzik @ 2006-06-09 19:25 UTC (permalink / raw)
  To: Alex Tomas
  Cc: Andrew Morton, ext2-devel, linux-kernel, Chase Venters,
	Linus Torvalds, cmm, linux-fsdevel, Andreas Dilger, Alan Cox

Alex Tomas wrote:
>>>>>> Alan Cox (AC) writes:
> 
>  AC> Unfortunately in the software case if you want it in the base kernel you
>  AC> are bolting new manifolds on everyones car at once, and someone is going
>  AC> to have an engine explode as a result.
> 
> please, don't forget you need to enable it by mount option.

Irrelevant.  That's a development-only situation.  It will be enabled by 
default eventually, and should be considered in that light.

	Jeff

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC 0/13] extents and 48bit ext3
  2006-06-09 19:25                               ` Jeff Garzik
@ 2006-06-09 19:35                                 ` Alex Tomas
  2006-06-09 19:35                                   ` [Ext2-devel] " Jeff Garzik
  2006-06-11 20:14                                   ` grundig
  0 siblings, 2 replies; 104+ messages in thread
From: Alex Tomas @ 2006-06-09 19:35 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Andrew Morton, ext2-devel, linux-kernel, Chase Venters,
	Linus Torvalds, cmm, linux-fsdevel, Alex Tomas, Andreas Dilger,
	Alan Cox

>>>>> Jeff Garzik (JG) writes:

 JG> Irrelevant.  That's a development-only situation.  It will be enabled
 JG> by default eventually, and should be considered in that light.

that's your point of view. mine is that this option (and code)
to be used only when needed. 

thanks, Alex

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [Ext2-devel] [RFC 0/13] extents and 48bit ext3
  2006-06-09 19:35                                 ` Alex Tomas
@ 2006-06-09 19:35                                   ` Jeff Garzik
  2006-06-11 20:14                                   ` grundig
  1 sibling, 0 replies; 104+ messages in thread
From: Jeff Garzik @ 2006-06-09 19:35 UTC (permalink / raw)
  To: Alex Tomas
  Cc: Alan Cox, Chase Venters, Linus Torvalds, Andreas Dilger,
	Andrew Morton, ext2-devel, linux-kernel, cmm, linux-fsdevel

Alex Tomas wrote:
>>>>>> Jeff Garzik (JG) writes:
> 
>  JG> Irrelevant.  That's a development-only situation.  It will be enabled
>  JG> by default eventually, and should be considered in that light.
> 
> that's your point of view. mine is that this option (and code)
> to be used only when needed. 

Regardless of any use "when needed," the code is in the codebase, and is 
thus the "if (metadata_v2) ... else ..." maintenance burden that has 
been discussed.

	Jeff




^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [Ext2-devel] [RFC 0/13] extents and 48bit ext3
  2006-06-09 19:35                                 ` Alex Tomas
  2006-06-09 19:35                                   ` [Ext2-devel] " Jeff Garzik
@ 2006-06-11 20:14                                   ` grundig
  1 sibling, 0 replies; 104+ messages in thread
From: grundig @ 2006-06-11 20:14 UTC (permalink / raw)
  To: Alex Tomas
  Cc: jeff, alex, alan, chase.venters, torvalds, adilger, akpm,
	ext2-devel, linux-kernel, cmm, linux-fsdevel

El Fri, 09 Jun 2006 23:35:43 +0400,
Alex Tomas <alex@clusterfs.com> escribió:

> >>>>> Jeff Garzik (JG) writes:
> 
>  JG> Irrelevant.  That's a development-only situation.  It will be enabled
>  JG> by default eventually, and should be considered in that light.
> 
> that's your point of view. mine is that this option (and code)
> to be used only when needed. 

Distros may ignore your opinion and may enable it, and users won't know
that it's enabled or even if such feature exist - until they try to run
an older kernel. If almost nobody needs this feature, why not avoid
problems by not merging it and maintaining it separated from the
main tree?
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [Ext2-devel] [RFC 0/13] extents and 48bit ext3
  2006-06-09 18:30                     ` Alex Tomas
  2006-06-09 18:38                       ` Linus Torvalds
@ 2006-06-09 18:43                       ` Jeff Garzik
  2006-06-09 18:50                       ` Diego Calleja
  2 siblings, 0 replies; 104+ messages in thread
From: Jeff Garzik @ 2006-06-09 18:43 UTC (permalink / raw)
  To: Alex Tomas
  Cc: Linus Torvalds, Andreas Dilger, Andrew Morton, ext2-devel,
	linux-kernel, cmm, linux-fsdevel

Alex Tomas wrote:
>>>>>> Linus Torvalds (LT) writes:
>  LT> My point is, maintaining two different pieces is SIMPLER.
> 
> "different" is a key word here. why should we copy most of ext3 code
> into ext4?
> 
>  LT> It would be bigger, if you made ext3 do 48-bit block numbers.
> 
> nope, we re-use existing i_data w/o any changes. yes, we've made
> inode a bit larger to cache last found extent. this improves
> performance in some workloads noticable though.
> 
>  LT> See? ext3 would become strictly _worse_ for the majority of users, who 
>  LT> wouldn't get any advantage. That's my point.
> 
> would "#if CONFIG_EXT3_EXTENTS" be a good solution then?

No, that would be worse.

	Jeff




^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [Ext2-devel] [RFC 0/13] extents and 48bit ext3
  2006-06-09 18:30                     ` Alex Tomas
  2006-06-09 18:38                       ` Linus Torvalds
  2006-06-09 18:43                       ` Jeff Garzik
@ 2006-06-09 18:50                       ` Diego Calleja
  2 siblings, 0 replies; 104+ messages in thread
From: Diego Calleja @ 2006-06-09 18:50 UTC (permalink / raw)
  To: Alex Tomas
  Cc: torvalds, adilger, alex, jeff, akpm, ext2-devel, linux-kernel,
	cmm, linux-fsdevel

El Fri, 09 Jun 2006 22:30:20 +0400,
Alex Tomas <alex@clusterfs.com> escribió:


>  LT> See? ext3 would become strictly _worse_ for the majority of users, who 
>  LT> wouldn't get any advantage. That's my point.
> 
> would "#if CONFIG_EXT3_EXTENTS" be a good solution then?

Not at all, a config option may be disabled by lots of distros
and make backwards compatibility even more difficult than
is already going to be.
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [Ext2-devel] [RFC 0/13] extents and 48bit ext3
  2006-06-09 18:10                 ` Andreas Dilger
  2006-06-09 18:22                   ` Linus Torvalds
@ 2006-06-09 18:40                   ` Jeff Garzik
  2006-06-09 18:59                     ` Andrew Morton
  2006-06-09 18:41                   ` Jeff Garzik
  2 siblings, 1 reply; 104+ messages in thread
From: Jeff Garzik @ 2006-06-09 18:40 UTC (permalink / raw)
  To: Andreas Dilger
  Cc: Linus Torvalds, Alex Tomas, Andrew Morton, ext2-devel,
	linux-kernel, cmm, linux-fsdevel

Andreas Dilger wrote:
> Having a single codebase for everyone means that it is continually maintained
> and users of ext3 aren't left out in the cold.

That implies continually upgrading ext3 for newer storage technologies, 
which in turn implies adding all sorts of incompatible formats to 
support better storage scaling, and new usage models.

This constant patching of ext3 is IMO one of the problems.  Let it 
stabilize with current storage technologies.


> On Jun 09, 2006  09:54 -0700, Linus Torvalds wrote:
>> Btw, I'm not kidding you on this one.
>>
>> THE NUMBER ONE MEMORY USAGE ON A LOT OF LOADS IS EXT3 INODES IN MEMORY!
> 
> Do you think that would be any different with a new filesystem?
> 
>> And you know what? 2TB files are totally uninteresting to 99.9999% of all 
>> people. Most people find it _much_ more interesting to have hundreds of 
>> thousands of _smaller_ files instead.
>>
>> So do this:
>>
>> 	cat /proc/slabinfo | grep ext3
> 
> # head -2 /proc/slabinfo
> slabinfo - version: 2.1
> name       <active_objs> <num_objs> <objsize> <objperslab>
> 
> # grep ext2 /proc/slabinfo
> ext2_inode_cache       0          0       572            7
> ext2_xattr             0          0        48           81
> 
> # grep ext3 /proc/slabinfo
> 
> ext3_inode_cache   30207      41418       616            6
> ext3_xattr             0          0        48           81
> 
> # grep xfs /proc/slabinfo
> xfs_ili             2558       2576       140           28
> xfs_inode           2558       2565       448            9
> 
> # grep jfs /proc/slabinfo
> jfs_ip                 0          0      1048            3
> 
> So, the ext3 inode could grow another ~50 bytes without changing the
> slab allocation size ;-), and in fact other filesystem aren't noticably
> different.
> 
>> and be absolutely disgusted and horrified by the size of those inodes 
>> already, and ask yourself whether extending the block size to 48 bits will 
>> help or further hurt one of the biggest problems of ext3 right now?
> 
> This is then the biggest problem of all filesystems.
> 
>> (And yes, I realize that block numbers are just a small part of it. The 
>> "vfs_inode" is also a real problem - it's got _way_ too many large 
>> list-heads that explode on a 64-bit kernel, for example. Oh, well.
> 
> On a 32-bit system the vfs_inode is more than half of the size of the ext3
> inode, it is worse on 64-bit systems.
> 
>> My point is that things like this can make a very real issue _worse_ for all 
>> the people who don't care one whit about it)
> 
> The current group of changes will be a no-op if CONFIG_LBD isn't enabled,
> and I think I argued fairly strongly to also have a CONFIG_ flag to allow
> larger than 2TB file support only for those users that want it.
> 
> Cheers, Andreas
> --
> Andreas Dilger
> Principal Software Engineer
> Cluster File Systems, Inc.
> 
> 


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC 0/13] extents and 48bit ext3
  2006-06-09 18:40                   ` Jeff Garzik
@ 2006-06-09 18:59                     ` Andrew Morton
  2006-06-09 19:16                       ` Jeff Garzik
  2006-06-09 20:44                       ` Alan Cox
  0 siblings, 2 replies; 104+ messages in thread
From: Andrew Morton @ 2006-06-09 18:59 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: ext2-devel, linux-kernel, torvalds, cmm, linux-fsdevel, alex,
	adilger

On Fri, 09 Jun 2006 14:40:56 -0400
Jeff Garzik <jeff@garzik.org> wrote:

> Andreas Dilger wrote:
> > Having a single codebase for everyone means that it is continually maintained
> > and users of ext3 aren't left out in the cold.
> 
> That implies continually upgrading ext3 for newer storage technologies, 
> which in turn implies adding all sorts of incompatible formats to 
> support better storage scaling, and new usage models.

Look, I'm not certain either way on this - I really don't like the format
incompatibility and I'd like to see a breakdown of the performance benefits
of each of the proposed new features so perhaps we can cherrypick.  And I'm
deferring judgement until I've looked at some patches.

But Jeff, please stop this wild exaggeration!  "continually upgrading",
"all sorts of incompatible formats".  It's not helping anything.  

Today's ext3 is, afaik, 100% on-disk compatible with ext3 from five years
ago, and probably with RH's 2.2-based implementation.  So we have not done
and will not do the things which you are FUDding us about.

This is (again, as far as I recall) the first on-disk-incompatible change
in ext3 which has ever been proposed.  It's not a thing which is done
lightly and it's not a thing which is likely to happen again for a very long
time indeed.

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC 0/13] extents and 48bit ext3
  2006-06-09 18:59                     ` Andrew Morton
@ 2006-06-09 19:16                       ` Jeff Garzik
  2006-06-09 20:27                         ` [Ext2-devel] " Chase Venters
  2006-06-09 20:44                       ` Alan Cox
  1 sibling, 1 reply; 104+ messages in thread
From: Jeff Garzik @ 2006-06-09 19:16 UTC (permalink / raw)
  To: Andrew Morton
  Cc: ext2-devel, linux-kernel, torvalds, cmm, linux-fsdevel, alex,
	adilger

Andrew Morton wrote:
> On Fri, 09 Jun 2006 14:40:56 -0400
> Jeff Garzik <jeff@garzik.org> wrote:
> 
>> Andreas Dilger wrote:
>>> Having a single codebase for everyone means that it is continually maintained
>>> and users of ext3 aren't left out in the cold.
>> That implies continually upgrading ext3 for newer storage technologies, 
>> which in turn implies adding all sorts of incompatible formats to 
>> support better storage scaling, and new usage models.
> 
> Look, I'm not certain either way on this - I really don't like the format
> incompatibility and I'd like to see a breakdown of the performance benefits
> of each of the proposed new features so perhaps we can cherrypick.  And I'm
> deferring judgement until I've looked at some patches.
> 
> But Jeff, please stop this wild exaggeration!  "continually upgrading",
> "all sorts of incompatible formats".  It's not helping anything.  
> 
> Today's ext3 is, afaik, 100% on-disk compatible with ext3 from five years
> ago, and probably with RH's 2.2-based implementation.  So we have not done
> and will not do the things which you are FUDding us about.
> 
> This is (again, as far as I recall) the first on-disk-incompatible change
> in ext3 which has ever been proposed.  It's not a thing which is done
> lightly and it's not a thing which is likely to happen again for a very long
> time indeed.

That's not really true, I include in the list EXT3_FEATURE_RO_COMPAT_*, 
EXT3_FEATURE_INCOMPAT_*, 32-bit uid/gid, ISTR some ACL-related mess, and 
the online resizing stuff that produces a filesystem slightly different 
than what mke2fs would produce for the same [larger] sized block device. 
  Red Hat has had at least one problem in the past where users were 
annoyed at format changes (htree?).

I certainly grant that extents and 48bit are format changes on a -much- 
larger scale than in the past.  Absolutely.

That's why I feel that this is a good point to calm down ext3 
development, and start putting stuff like extents into ext4.  If we are 
starting to make major changes to the format, that should be a signal 
that we are starting to work on a new filesystem, rather than patching 
an old one.

I disagree with the "years to stabilize ext4" argument, because we are 
starting from a known good point.  I think ext4 will be easier to 
maintain and tune for modern storage systems, if we don't have to worry 
as much about that stuff for ext3.

	Jeff

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [Ext2-devel] [RFC 0/13] extents and 48bit ext3
  2006-06-09 19:16                       ` Jeff Garzik
@ 2006-06-09 20:27                         ` Chase Venters
  0 siblings, 0 replies; 104+ messages in thread
From: Chase Venters @ 2006-06-09 20:27 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Andrew Morton, adilger, torvalds, alex, ext2-devel, linux-kernel,
	cmm, linux-fsdevel

On Fri, 9 Jun 2006, Jeff Garzik wrote:

> I disagree with the "years to stabilize ext4" argument, because we are 
> starting from a known good point.  I think ext4 will be easier to maintain 
> and tune for modern storage systems, if we don't have to worry as much about 
> that stuff for ext3.

Let's say we

# cp ext3 ext4
# cat extents 48bit | patch

and then roll it out in 2.6.18. That in and of itself is probably fine and 
stable (though it's no different than ext3 except for the name and the two 
new additions).

But are you going to do this again for ext5 when more features come along? 
Or are you going to warn ext4 users that the FS is not expected to be stable?

If you do the latter, be prepared for people to be wary of using it for a 
long while. The difference is between actual and perceived stability.

To put a finer point on it - I've got a system that's been running 
flawlessly for years on 2.5.3. It's actually been stable - never had any 
sort of crashing problem at all. But I'm essentially crazy for running 
that kernel. At the time I installed it, it certainly wasn't perceived as 
stable. If the computer in question were any more than a file server / 
iptables box for my home, I'd have said "well, hell, I think I'm going to 
have to do without 2.5 so that I can have something trustworthy."

(Amusingly enough, I started assembling a replacement for it recently, 
if only to have something newer and more capable. Having gone from 
Slackware to Gentoo I decided to give the April stable 
Debian release a whirl. Imagine my shock and awe when I watched Debian 
boot into a 2.4 kernel :P)

> 	Jeff
>

Cheers,
Chase

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC 0/13] extents and 48bit ext3
  2006-06-09 18:59                     ` Andrew Morton
  2006-06-09 19:16                       ` Jeff Garzik
@ 2006-06-09 20:44                       ` Alan Cox
  2006-06-11 15:52                         ` [Ext2-devel] " Arjan van de Ven
  1 sibling, 1 reply; 104+ messages in thread
From: Alan Cox @ 2006-06-09 20:44 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Jeff Garzik, ext2-devel, linux-kernel, torvalds, cmm,
	linux-fsdevel, alex, adilger

Ar Gwe, 2006-06-09 am 11:59 -0700, ysgrifennodd Andrew Morton:
> Today's ext3 is, afaik, 100% on-disk compatible with ext3 from five years
> ago, and probably with RH's 2.2-based implementation.  

If your files are under 2GB long, you've not used any attributes,
SELinux labels or various other things maybe. In the practical real
world case it isn't. I doubt many Fedora/Red Hat users have a single FS
from RHEL4/FC1 onwards that is readable by 2.2 ext3 (or most 2.4 ext3)

OTOH the number of complaints about this is minimal, people want to go
forwards in a controlled manner not backwards.

Alan

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [Ext2-devel] [RFC 0/13] extents and 48bit ext3
  2006-06-09 20:44                       ` Alan Cox
@ 2006-06-11 15:52                         ` Arjan van de Ven
  0 siblings, 0 replies; 104+ messages in thread
From: Arjan van de Ven @ 2006-06-11 15:52 UTC (permalink / raw)
  To: Alan Cox
  Cc: Andrew Morton, Jeff Garzik, adilger, torvalds, alex, ext2-devel,
	linux-kernel, cmm, linux-fsdevel

On Fri, 2006-06-09 at 21:44 +0100, Alan Cox wrote:
> OTOH the number of complaints about this is minimal, people want to go
> forwards in a controlled manner not backwards.

well... they want to be able to go "a little bit" backwards; say one
version of an OS (6 months). Eg the scenario that ought to work is "go
to newer version, hate it, go back". But yes that's a limited time to go
back, not the "go back to 2.2" kind of "go back".


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [Ext2-devel] [RFC 0/13] extents and 48bit ext3
  2006-06-09 18:10                 ` Andreas Dilger
  2006-06-09 18:22                   ` Linus Torvalds
  2006-06-09 18:40                   ` Jeff Garzik
@ 2006-06-09 18:41                   ` Jeff Garzik
  2 siblings, 0 replies; 104+ messages in thread
From: Jeff Garzik @ 2006-06-09 18:41 UTC (permalink / raw)
  To: Linus Torvalds, Alex Tomas, Jeff Garzik, Andrew Morton,
	ext2-devel, linux-kernel, cmm, linux-fsdevel

Andreas Dilger wrote:
> The current group of changes will be a no-op if CONFIG_LBD isn't enabled,
> and I think I argued fairly strongly to also have a CONFIG_ flag to allow
> larger than 2TB file support only for those users that want it.

Please be realistic.

Distros will all want to turn this on, from now until eternity.

	Jeff



^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [Ext2-devel] [RFC 0/13] extents and 48bit ext3
  2006-06-09 16:25             ` Linus Torvalds
  2006-06-09 16:48               ` Alex Tomas
  2006-06-09 16:54               ` Linus Torvalds
@ 2006-06-09 17:12               ` Jeff Anderson-Lee
  2006-06-09 18:02               ` Andrew Morton
  3 siblings, 0 replies; 104+ messages in thread
From: Jeff Anderson-Lee @ 2006-06-09 17:12 UTC (permalink / raw)
  To: linux-kernel; +Cc: 'ext2-devel', linux-fsdevel

Linus Torvalds wrote:
> On Fri, 9 Jun 2006, Alex Tomas wrote:
>  
> > I believe it's as stable as before until you mount with extents
> > mount option.
>    
> In contrast, the last time two different filesystems introduced bugs in 
> each other was approximately "never". They simply don't modify each others

> code, they don't look at each others data structures, and they don't jump 
> into each others routines.

As an interested bystander (and large filesystem user), I'd say I tend to 
agree with Linus and Jeff on this one.

* ext3 is arguably the main Linux filesystem: too important to keep 
  "experimenting" with.

* I'd encourage a >2TB version, but call it ext4.  It makes it clear
  that you are entering new territory.

* Take advantage of the switch to remove some of the backward compatibility
  cruft from the ext4 version -- make it a clean, explicit break.

* [Possibly even inoculate ext3 against creeping featuris and work on 
  cleanup and optimization instead.]

This is not intended to slight the work/position of the ext3 developers,
merely to inform them of an end-user's perspective.

----
Jeff Anderson-Lee
Petabyte Storage Infrastructure Project
University of California Berkeley
"Simplify, simplify, simplify." -- Henry David Thoreau 
"I think one 'simplify' would have sufficed." -- Ralph Waldo Emerson 

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [Ext2-devel] [RFC 0/13] extents and 48bit ext3
  2006-06-09 16:25             ` Linus Torvalds
                                 ` (2 preceding siblings ...)
  2006-06-09 17:12               ` Jeff Anderson-Lee
@ 2006-06-09 18:02               ` Andrew Morton
  3 siblings, 0 replies; 104+ messages in thread
From: Andrew Morton @ 2006-06-09 18:02 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: alex, jeff, ext2-devel, linux-kernel, cmm, linux-fsdevel, adilger

On Fri, 9 Jun 2006 09:25:57 -0700 (PDT)
Linus Torvalds <torvalds@osdl.org> wrote:

> (buffer heads! In 2006!)

We should be able to make the vast majority of those go away, btw.

We already have `-o data=writeback,nobh'.  That gives us writeback-mode
with no buffer_heads on the pagecache.

On top of that we can implement nobh ordered-mode by adding an inode walk
which calls do_sync_file_range() into the appropriate place in commit.

The tricky part is the inode walk - at present super_block.s_list is a
list_head and it's not trivial to walk that without missing some inodes.

Probably it could be done via a new fs-private dirty-inode list which we
hande carefully, or via a walk of an i_ino-ordered radix-tree, which
doesn't miss things.

I floated this a year or so ago, but no little fishies bit.

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [Ext2-devel] [RFC 0/13] extents and 48bit ext3
  2006-06-09 15:08     ` Jeff Garzik
  2006-06-09 15:25       ` Jeff Garzik
@ 2006-06-09 15:28       ` Alex Tomas
  2006-06-09 15:44         ` Jeff Garzik
  2006-06-09 15:53         ` Gerrit Huizenga
  2006-06-09 20:32       ` Stephen C. Tweedie
  2 siblings, 2 replies; 104+ messages in thread
From: Alex Tomas @ 2006-06-09 15:28 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Andreas Dilger, Andrew Morton, ext2-devel, linux-kernel,
	Linus Torvalds, cmm, linux-fsdevel

 JG> "ext3" will become more and more meaningless.  It could mean _any_ of 
 JG> several filesystem metadata variants, and the admin will have no clue 
 JG> which variant they are talking to until they try to mount the blkdev 
 JG> (and possibly fail the mount).

debugfs <dev> -R stats | grep features ?


thanks, Alex

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [Ext2-devel] [RFC 0/13] extents and 48bit ext3
  2006-06-09 15:28       ` Alex Tomas
@ 2006-06-09 15:44         ` Jeff Garzik
  2006-06-09 15:53           ` Alex Tomas
  2006-06-09 15:53         ` Gerrit Huizenga
  1 sibling, 1 reply; 104+ messages in thread
From: Jeff Garzik @ 2006-06-09 15:44 UTC (permalink / raw)
  To: Alex Tomas
  Cc: Andreas Dilger, Andrew Morton, ext2-devel, linux-kernel,
	Linus Torvalds, cmm, linux-fsdevel

Alex Tomas wrote:
>  JG> "ext3" will become more and more meaningless.  It could mean _any_ of 
>  JG> several filesystem metadata variants, and the admin will have no clue 
>  JG> which variant they are talking to until they try to mount the blkdev 
>  JG> (and possibly fail the mount).
> 
> debugfs <dev> -R stats | grep features ?

The question is, do you

a) expect users to run this magic command, and DTRT or

b) watch users boot w/ extents, accidentally do something silly like 
writing data to a file, and become locked into a new subset of kernels?

The simple act of writing data to a file has become an _irrevocable 
filesystem upgrade event_.

	Jeff




^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC 0/13] extents and 48bit ext3
  2006-06-09 15:44         ` Jeff Garzik
@ 2006-06-09 15:53           ` Alex Tomas
  2006-06-09 15:52             ` Jeff Garzik
  0 siblings, 1 reply; 104+ messages in thread
From: Alex Tomas @ 2006-06-09 15:53 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Andrew Morton, ext2-devel, linux-kernel, Linus Torvalds, cmm,
	linux-fsdevel, Alex Tomas, Andreas Dilger

>>>>> Jeff Garzik (JG) writes:

 JG> Alex Tomas wrote:
 JG> "ext3" will become more and more meaningless.  It could mean
 >> _any_ of  JG> several filesystem metadata variants, and the admin
 >> will have no clue  JG> which variant they are talking to until they
 >> try to mount the blkdev  JG> (and possibly fail the mount).
 >> debugfs <dev> -R stats | grep features ?

 JG> The question is, do you

 JG> a) expect users to run this magic command, and DTRT or

 JG> b) watch users boot w/ extents, accidentally do something silly like
 JG> writing data to a file, and become locked into a new subset of kernels?

at the moment there is no way to "boot w/ extents". you must enable
them by mount option.


thanks, Alex

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC 0/13] extents and 48bit ext3
  2006-06-09 15:53           ` Alex Tomas
@ 2006-06-09 15:52             ` Jeff Garzik
  2006-06-09 16:02               ` Alex Tomas
  0 siblings, 1 reply; 104+ messages in thread
From: Jeff Garzik @ 2006-06-09 15:52 UTC (permalink / raw)
  To: Alex Tomas
  Cc: Andrew Morton, ext2-devel, linux-kernel, Linus Torvalds, cmm,
	linux-fsdevel, Andreas Dilger

Alex Tomas wrote:
>>>>>> Jeff Garzik (JG) writes:
> 
>  JG> Alex Tomas wrote:
>  JG> "ext3" will become more and more meaningless.  It could mean
>  >> _any_ of  JG> several filesystem metadata variants, and the admin
>  >> will have no clue  JG> which variant they are talking to until they
>  >> try to mount the blkdev  JG> (and possibly fail the mount).
>  >> debugfs <dev> -R stats | grep features ?
> 
>  JG> The question is, do you
> 
>  JG> a) expect users to run this magic command, and DTRT or
> 
>  JG> b) watch users boot w/ extents, accidentally do something silly like
>  JG> writing data to a file, and become locked into a new subset of kernels?
> 
> at the moment there is no way to "boot w/ extents". you must enable
> them by mount option.

Think about how distros will deploy this feature.  Also, think about how 
scalable that line of thinking is...

	Jeff

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC 0/13] extents and 48bit ext3
  2006-06-09 15:52             ` Jeff Garzik
@ 2006-06-09 16:02               ` Alex Tomas
  2006-06-09 16:04                 ` [Ext2-devel] " Jeff Garzik
  0 siblings, 1 reply; 104+ messages in thread
From: Alex Tomas @ 2006-06-09 16:02 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Andrew Morton, ext2-devel, linux-kernel, Linus Torvalds, cmm,
	linux-fsdevel, Alex Tomas, Andreas Dilger

>>>>> Jeff Garzik (JG) writes:

 JG> Alex Tomas wrote:
 >> at the moment there is no way to "boot w/ extents". you must enable
 >> them by mount option.

 JG> Think about how distros will deploy this feature.  Also, think about
 JG> how scalable that line of thinking is...

I may be wrong, but I tend to think if they're stupid enough to enable
experimental mount option by default, they can do s/ext3/ext4 as well.

thanks, Alex

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [Ext2-devel] [RFC 0/13] extents and 48bit ext3
  2006-06-09 16:02               ` Alex Tomas
@ 2006-06-09 16:04                 ` Jeff Garzik
  0 siblings, 0 replies; 104+ messages in thread
From: Jeff Garzik @ 2006-06-09 16:04 UTC (permalink / raw)
  To: Alex Tomas
  Cc: Andreas Dilger, Andrew Morton, ext2-devel, linux-kernel,
	Linus Torvalds, cmm, linux-fsdevel

Alex Tomas wrote:
>>>>>> Jeff Garzik (JG) writes:
> 
>  JG> Alex Tomas wrote:
>  >> at the moment there is no way to "boot w/ extents". you must enable
>  >> them by mount option.
> 
>  JG> Think about how distros will deploy this feature.  Also, think about
>  JG> how scalable that line of thinking is...
> 
> I may be wrong, but I tend to think if they're stupid enough to enable
> experimental mount option by default, they can do s/ext3/ext4 as well.

<sigh>  At some point in the future, it will not be experimental.

	Jeff




^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [Ext2-devel] [RFC 0/13] extents and 48bit ext3
  2006-06-09 15:28       ` Alex Tomas
  2006-06-09 15:44         ` Jeff Garzik
@ 2006-06-09 15:53         ` Gerrit Huizenga
  2006-06-09 16:03           ` Jeff Garzik
  2006-06-09 16:09           ` Linus Torvalds
  1 sibling, 2 replies; 104+ messages in thread
From: Gerrit Huizenga @ 2006-06-09 15:53 UTC (permalink / raw)
  To: Alex Tomas
  Cc: Jeff Garzik, Andrew Morton, ext2-devel, linux-kernel,
	Linus Torvalds, cmm, linux-fsdevel, Andreas Dilger

On Fri, 09 Jun 2006 19:28:22 +0400, Alex Tomas wrote:
>  JG> "ext3" will become more and more meaningless.  It could mean _any_ of 
>  JG> several filesystem metadata variants, and the admin will have no clue 
>  JG> which variant they are talking to until they try to mount the blkdev 
>  JG> (and possibly fail the mount).
> 
> debugfs <dev> -R stats | grep features ?

Sounds similar to cat /proc/cpuinfo.  How *do* we deal with processors
which have all these many different features?  Probably better than we
would if each variant were viewed as a different architecture.

Jeff's approach taken to the rediculous would mean that we'd have
ext versions 1-40 by now at least.  I don't think that helps much,
either.

I think the ext2/3 team has done a great job of providing compatibility.
It isn't perfect compatibility forwards *and* backwards, but moving
forwards always seems to be pretty reasonable.

gerrit

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [Ext2-devel] [RFC 0/13] extents and 48bit ext3
  2006-06-09 15:53         ` Gerrit Huizenga
@ 2006-06-09 16:03           ` Jeff Garzik
  2006-06-09 16:09           ` Linus Torvalds
  1 sibling, 0 replies; 104+ messages in thread
From: Jeff Garzik @ 2006-06-09 16:03 UTC (permalink / raw)
  To: Gerrit Huizenga
  Cc: Alex Tomas, Andrew Morton, ext2-devel, linux-kernel,
	Linus Torvalds, cmm, linux-fsdevel, Andreas Dilger

Gerrit Huizenga wrote:
> Jeff's approach taken to the rediculous would mean that we'd have
> ext versions 1-40 by now at least.  I don't think that helps much,
> either.

That's plainly silly.  Like everything else in life, it is a balance of 
costs.

At some point, ext3's fs-feature-flag approach increases the 
combinations of metadata variants you must support exponentially.

Moving to extents and 48bit (which I want) is a big enough step that, 
IMO, some of the support costs become far more obvious.

	Jeff

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [Ext2-devel] [RFC 0/13] extents and 48bit ext3
  2006-06-09 15:53         ` Gerrit Huizenga
  2006-06-09 16:03           ` Jeff Garzik
@ 2006-06-09 16:09           ` Linus Torvalds
  2006-06-09 17:58             ` Gerrit Huizenga
  1 sibling, 1 reply; 104+ messages in thread
From: Linus Torvalds @ 2006-06-09 16:09 UTC (permalink / raw)
  To: Gerrit Huizenga
  Cc: Alex Tomas, Jeff Garzik, Andrew Morton, ext2-devel, linux-kernel,
	cmm, linux-fsdevel, Andreas Dilger

On Fri, 9 Jun 2006, Gerrit Huizenga wrote:
> 
> Jeff's approach taken to the rediculous would mean that we'd have
> ext versions 1-40 by now at least.  I don't think that helps much,
> either.

On the other hand, I _guarantee_ you that it helps that we have ext2-3, 
and not just ext2 (nobody even tried to keep ext1 compatible, thank the 
Gods).

If for no other reason, than the fact that the ext3 development could be 
much more aggressive early on. Exactly because it did NOT screw up the old 
filesystem that everybody else depended on.

So we have empirical evidence that splitting filesystem work up does 
actually help. 

			Linus

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC 0/13] extents and 48bit ext3
  2006-06-09 16:09           ` Linus Torvalds
@ 2006-06-09 17:58             ` Gerrit Huizenga
  2006-06-09 18:25               ` [Ext2-devel] " Chase Venters
                                 ` (2 more replies)
  0 siblings, 3 replies; 104+ messages in thread
From: Gerrit Huizenga @ 2006-06-09 17:58 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andrew Morton, Jeff Garzik, ext2-devel, linux-kernel, cmm,
	linux-fsdevel, Alex Tomas, Andreas Dilger

On Fri, 09 Jun 2006 09:09:01 PDT, Linus Torvalds wrote:
> On Fri, 9 Jun 2006, Gerrit Huizenga wrote:
> > 
> > Jeff's approach taken to the rediculous would mean that we'd have
> > ext versions 1-40 by now at least.  I don't think that helps much,
> > either.
> 
> On the other hand, I _guarantee_ you that it helps that we have ext2-3, 
> and not just ext2 (nobody even tried to keep ext1 compatible, thank the 
> Gods).

I had originally argued for ext4 as well based on the fact that it would
allow lots of potential cleanups & simplifications and at the same time
would allow a break in the on disk filesystems layout.

These changes don't yet change the actual on-disk layout and that might
be something that would be done if ext4 were a real, new filesystem.

But then how long until ext4 is used enough to be put into production?
How much testing will it *really* get in any form?  How long before
the people that are using 100 TB+ disk farms today (some of which are
chopping filesystems into 2-8 GB chunks, others with 2 TB filesystems
today) actually trust this new filesystem (most vendors don't support
JFS today, XFS support isn't much better).

We are seeing storage needs increasing at a frightening rate.  Health
Care folks want to store your MRI's, x-ray's, ultraounds, etc. in high
res digital format across your entire life in near-line format.  Terabytes
over time per person.  Europe is already doing this pretty extensively,
the US is following suit.  Digital media creation has huge storage needs.
Most everything is moving to podcasts, webcasts, streaming audio & video.
Storage is huge, and ext3 is at the current breaking point.

I'd argue that whatever we call it, we need a standard, stable, supported
solution *soon* for large files, large filesystems, large storage systems
in Linux.

I'd think the quickest path is to relieve the pressure now in ext3.

We still haven't solved the filesystem check time problem, which is the
next big bugaboo.  But getting large fileysstems to real customers soon,
e.g. in mainline, well tested, ready for distro support is my real goal.

> If for no other reason, than the fact that the ext3 development could be 
> much more aggressive early on. Exactly because it did NOT screw up the old 
> filesystem that everybody else depended on.

Yes, but we want agressive with robustness for real users soon.  Lots
of crazy ext4 development could become technical wanking in no time, with
no point of stability, and no general usefulness in the short term.

> So we have empirical evidence that splitting filesystem work up does 
> actually help. 

Agreed.  But... Maybe that should be the set of changes *following*
extents.  Then the file format can change, several of the pending ideas
can be worked in, and some of the backwards compatibility can be cleaned
out if it is in the way.  Then the extents work can get us something
usable in all the interim distro releases for the real users who are
screaming now about the filesystem size limits.

gerrit

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [Ext2-devel] [RFC 0/13] extents and 48bit ext3
  2006-06-09 17:58             ` Gerrit Huizenga
@ 2006-06-09 18:25               ` Chase Venters
  2006-06-10 13:46               ` Adrian Bunk
  2006-06-13 13:34               ` Helge Hafting
  2 siblings, 0 replies; 104+ messages in thread
From: Chase Venters @ 2006-06-09 18:25 UTC (permalink / raw)
  To: Gerrit Huizenga
  Cc: Linus Torvalds, Alex Tomas, Jeff Garzik, Andrew Morton,
	ext2-devel, linux-kernel, cmm, linux-fsdevel, Andreas Dilger

On Fri, 9 Jun 2006, Gerrit Huizenga wrote:

> We are seeing storage needs increasing at a frightening rate.  Health
> Care folks want to store your MRI's, x-ray's, ultraounds, etc. in high
> res digital format across your entire life in near-line format.  Terabytes
> over time per person.  Europe is already doing this pretty extensively,
> the US is following suit.  Digital media creation has huge storage needs.
> Most everything is moving to podcasts, webcasts, streaming audio & video.
> Storage is huge, and ext3 is at the current breaking point.
>
> I'd argue that whatever we call it, we need a standard, stable, supported
> solution *soon* for large files, large filesystems, large storage systems
> in Linux.
>
> I'd think the quickest path is to relieve the pressure now in ext3.

Makes sense...

>> So we have empirical evidence that splitting filesystem work up does
>> actually help.
>
> Agreed.  But... Maybe that should be the set of changes *following*
> extents.  Then the file format can change, several of the pending ideas
> can be worked in, and some of the backwards compatibility can be cleaned
> out if it is in the way.  Then the extents work can get us something
> usable in all the interim distro releases for the real users who are
> screaming now about the filesystem size limits.

Let's call ext3 "Linux 2.4" for a second and ext(x) w/extents and 48-bit 
"Linux 2.5". We can now do all the crazy, wild work we want on 2.5, but 
people need it tomorrow. And they can have it, but we're stamping 
"Dangerous! Dangerous! Unstable! API changes every 5 minutes, your data 
will be obsoleted each release!" all over it. This goes on for years until 
we finally reach a point where we can roll out "Linux 2.6".

The trouble is that "Linux 2.6" is something many of us are going to be 
wanting _now_.

Now, taking the quotes back off "Linux 2.6" and speaking about the kernel 
as a whole again - isn't lots of incremental stable releases with new 
functionality something that cutting off the development arm made 
possible?

I acknowledge the concerns about filesystem stability and Linus's points 
about improperly shared code. From a practical standpoint, I see the need 
of bigger filesystems coming.

And the biggest practical problem I see is one of perception. Making 
'ext4' means labelling it unstable for a while. And once something like a 
_filesystem_ is called unstable, it's going to be a long time before 
people trust it with terabytes of their incredibly valuable data (even if 
we promise them that it's mostly an ext3 fork).

Whereas if you play with some experimental 48-bit extension on ext3, well, 
ext3 already has a good reputation and is in use everywhere, so maybe this 
isn't a bad "last feature" to add before forking off into ext4-land?

> gerrit

Cheers,
Chase

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [Ext2-devel] [RFC 0/13] extents and 48bit ext3
  2006-06-09 17:58             ` Gerrit Huizenga
  2006-06-09 18:25               ` [Ext2-devel] " Chase Venters
@ 2006-06-10 13:46               ` Adrian Bunk
  2006-06-13 13:34               ` Helge Hafting
  2 siblings, 0 replies; 104+ messages in thread
From: Adrian Bunk @ 2006-06-10 13:46 UTC (permalink / raw)
  To: Gerrit Huizenga
  Cc: Linus Torvalds, Alex Tomas, Jeff Garzik, Andrew Morton,
	ext2-devel, linux-kernel, cmm, linux-fsdevel, Andreas Dilger

On Fri, Jun 09, 2006 at 10:58:00AM -0700, Gerrit Huizenga wrote:
> 
> On Fri, 09 Jun 2006 09:09:01 PDT, Linus Torvalds wrote:
> > On Fri, 9 Jun 2006, Gerrit Huizenga wrote:
> > > 
> > > Jeff's approach taken to the rediculous would mean that we'd have
> > > ext versions 1-40 by now at least.  I don't think that helps much,
> > > either.
> > 
> > On the other hand, I _guarantee_ you that it helps that we have ext2-3, 
> > and not just ext2 (nobody even tried to keep ext1 compatible, thank the 
> > Gods).
>  
> I had originally argued for ext4 as well based on the fact that it would
> allow lots of potential cleanups & simplifications and at the same time
> would allow a break in the on disk filesystems layout.
> 
> These changes don't yet change the actual on-disk layout and that might
> be something that would be done if ext4 were a real, new filesystem.
> 
> But then how long until ext4 is used enough to be put into production?
> How much testing will it *really* get in any form?  How long before
> the people that are using 100 TB+ disk farms today (some of which are
> chopping filesystems into 2-8 GB chunks, others with 2 TB filesystems
> today) actually trust this new filesystem (most vendors don't support
> JFS today, XFS support isn't much better).

You want to get the new features into ext3 instead of creating ext4 for
getting them better tested.

Other people in this thread want to get the new features into ext3 
instead of creating ext4 telling that this won't do harm for existing 
users since users will have to explicitely enable it.

Hearing people using contrary arguments in the same discussion always 
sounds as if they don't actually know what they want to do...

> We are seeing storage needs increasing at a frightening rate.  Health
> Care folks want to store your MRI's, x-ray's, ultraounds, etc. in high
> res digital format across your entire life in near-line format.  Terabytes
> over time per person.  Europe is already doing this pretty extensively,
> the US is following suit.  Digital media creation has huge storage needs.
> Most everything is moving to podcasts, webcasts, streaming audio & video.
> Storage is huge, and ext3 is at the current breaking point.
> 
> I'd argue that whatever we call it, we need a standard, stable, supported
> solution *soon* for large files, large filesystems, large storage systems
> in Linux.
> 
> I'd think the quickest path is to relieve the pressure now in ext3.

Why aren't JFS and XFS good enough for relieving the pressure now?

> We still haven't solved the filesystem check time problem, which is the
> next big bugaboo.  But getting large fileysstems to real customers soon,
> e.g. in mainline, well tested, ready for distro support is my real goal.
>...

Other people have the "no regressions for existing ext3 users" goal.

> gerrit

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [Ext2-devel] [RFC 0/13] extents and 48bit ext3
  2006-06-09 17:58             ` Gerrit Huizenga
  2006-06-09 18:25               ` [Ext2-devel] " Chase Venters
  2006-06-10 13:46               ` Adrian Bunk
@ 2006-06-13 13:34               ` Helge Hafting
  2 siblings, 0 replies; 104+ messages in thread
From: Helge Hafting @ 2006-06-13 13:34 UTC (permalink / raw)
  To: Gerrit Huizenga
  Cc: Linus Torvalds, Alex Tomas, Jeff Garzik, Andrew Morton,
	ext2-devel, linux-kernel, cmm, linux-fsdevel, Andreas Dilger

Gerrit Huizenga wrote:
> On Fri, 09 Jun 2006 09:09:01 PDT, Linus Torvalds wrote:
>   
>> On Fri, 9 Jun 2006, Gerrit Huizenga wrote:
>>     
>>> Jeff's approach taken to the rediculous would mean that we'd have
>>> ext versions 1-40 by now at least.  I don't think that helps much,
>>> either.
>>>       
>> On the other hand, I _guarantee_ you that it helps that we have ext2-3, 
>> and not just ext2 (nobody even tried to keep ext1 compatible, thank the 
>> Gods).
>>     
>  
> I had originally argued for ext4 as well based on the fact that it would
> allow lots of potential cleanups & simplifications and at the same time
> would allow a break in the on disk filesystems layout.
>
> These changes don't yet change the actual on-disk layout and that might
> be something that would be done if ext4 were a real, new filesystem.
>
> But then how long until ext4 is used enough to be put into production?
>   
No problem.  It didn't take long for ext3 - it won't take long for ext4.
First, you have developers and some enthusiasts using it.
Then, you get the thousands of people who like living
on the edge using ext4. As soon as it doesn't have bad known bugs.
Then some distros pick it up, wanting to be first with
large-disk support.
After that, it is considered "harmless".

If a break in on-disk layout is useful, then the time is now while
a new fs is introduced anyway.  It could be 7 years to the next chance.

Helge Hafting



^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC 0/13] extents and 48bit ext3
  2006-06-09 15:08     ` Jeff Garzik
  2006-06-09 15:25       ` Jeff Garzik
  2006-06-09 15:28       ` Alex Tomas
@ 2006-06-09 20:32       ` Stephen C. Tweedie
  2006-06-09 20:46         ` Linus Torvalds
  2 siblings, 1 reply; 104+ messages in thread
From: Stephen C. Tweedie @ 2006-06-09 20:32 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Andrew Morton, Stephen Tweedie, ext2-devel@lists.sourceforge.net,
	linux-kernel, Linus Torvalds, Mingming Cao, linux-fsdevel,
	Andreas Dilger

Hi,

On Fri, 2006-06-09 at 11:08 -0400, Jeff Garzik wrote:

> Stuffing more and more features into fs/ext3 means you are following the 
> path that leads to reiser4...  where EVERYTHING under the hood is 
> mutable, all within fs/ext3.

> Why do you insist upon calling the end result ext3, when the truth is 
> that you are slowing rewriting ext3?

The trouble is, does it make sense to do otherwise?

Should large file support have resulted in ext4?  ACLs/xattrs, ext5?
Htree, ext6?  Online resize, ext7?  Yes, let's make it ext8 for extents!

> Here's a key question for ext3 developers, which I bet has no answer: 
> when is it enough?

When is the Linux syscall interface enough?  When should we just bump it
and cut out all the compatibility interfaces?

No, we don't; we let people configure certain obsolete bits out (a.out
support etc), but we keep it in the tree despite the indirection cost to
maintain multiple interfaces etc.

> > While this is partly true, one of the big benefits is that you can
> > transparently upgrade your system to use the new features and improve
> > performance without a long outage window.  Having a completely separate
> 
> Changing the name to ext4 doesn't erase this capability.

The name is irrelevant here.  FWIW, something we've considered is to
make the user visibility of a batch of new features more obvious by
labelling them "ext4", so "mke4fs" would automatically enable those
features and the filesystem could register "ext4" as an fs type in the
kernel.

But that could be done without forking the codebase.  It would just be a
matter of binding feature flag sets to the given name.  What you're
talking about is forking the codebase itself, and I don't see the need
for that right now.

> > ext4 filesystem doesn't improve the compatibility story at all.  There
> > has been renewed discussion on implementing "mounting ext3 without a
> > journal", just for a recovery mode, because ext2 will not be modified
> > to get all of these features (running e2fsck on a huge filesystem each
> > reboot would be insane).
> 
> So now you are going backwards, and implementing ext2-within-ext3?

No, it would be a readonly emergency mode, not writable ext2 at all.

> Are you ready to admit, yet, that ext3 is 100% mutable in the minds of 
> ext3 developers?

The kernel syscall interface is 100% mutable by the same criteria.
Except in each case it's not "mutable", it's "extensible", which is a
*far* different thing.

> If all the ext3 developers are on board, that just implies that there is 
> no clear definition of what "ext3" really means.  With this patch 
> series, and with future plans described here and elsewhere, the name 
> "ext3" will become more and more meaningless.

Does the continuing addition of futexes, inotify, $FAVOURITE_FEATURE_OF_
THE_DAY mean that "Linux" is more and more meaningless?  I fail to see
much difference.  An application coded for linux-2.0's public interfaces
will, for the most part, if we do our jobs right, continue to work on
2.6.  An application coded for 2.6, expecting to use AIO, large files,
futexes and NPTL, will definitely not run on 2.0.  The incremental
extension of ext3 doesn't seem to be a fundamentally different concept.

Backwards compatibility of the kernel ABI is considered important; so in
ext3, the developers have a high regard for backwards compatibility of
on-disk data.  Personally I see that as an asset, not a problem; indeed,
it was the single most important design criterion from the outset.

--Stephen

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC 0/13] extents and 48bit ext3
  2006-06-09 20:32       ` Stephen C. Tweedie
@ 2006-06-09 20:46         ` Linus Torvalds
  2006-06-20  6:15           ` [Ext2-devel] " Qi Yong
  0 siblings, 1 reply; 104+ messages in thread
From: Linus Torvalds @ 2006-06-09 20:46 UTC (permalink / raw)
  To: Stephen C. Tweedie
  Cc: Andrew Morton, Jeff Garzik, ext2-devel@lists.sourceforge.net,
	linux-kernel, Mingming Cao, linux-fsdevel, Andreas Dilger



On Fri, 9 Jun 2006, Stephen C. Tweedie wrote:
> 
> When is the Linux syscall interface enough?  When should we just bump it
> and cut out all the compatibility interfaces?
> 
> No, we don't; we let people configure certain obsolete bits out (a.out
> support etc), but we keep it in the tree despite the indirection cost to
> maintain multiple interfaces etc.

Right. WE ADD NEW SYSTEM CALLS. WE DO NOT EXTEND THE OLD ONES IN WAYS THAT 
MIGHT BREAK OLD USERS.

Your point was exactly what?

Btw, where did that 2TB limit number come from? Afaik, it should be 16TB 
for a 4kB filesystem, no?

		Linus

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [Ext2-devel] [RFC 0/13] extents and 48bit ext3
  2006-06-09 20:46         ` Linus Torvalds
@ 2006-06-20  6:15           ` Qi Yong
  2006-06-20  8:26             ` Laurent Vivier
  0 siblings, 1 reply; 104+ messages in thread
From: Qi Yong @ 2006-06-20  6:15 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Stephen C. Tweedie, Jeff Garzik, Andreas Dilger, Andrew Morton,
	ext2-devel@lists.sourceforge.net, linux-kernel, Mingming Cao,
	linux-fsdevel, alex

Linus Torvalds wrote:

>On Fri, 9 Jun 2006, Stephen C. Tweedie wrote:
>  
>
>>When is the Linux syscall interface enough?  When should we just bump it
>>and cut out all the compatibility interfaces?
>>
>>No, we don't; we let people configure certain obsolete bits out (a.out
>>support etc), but we keep it in the tree despite the indirection cost to
>>maintain multiple interfaces etc.
>>    
>>
>
>Right. WE ADD NEW SYSTEM CALLS. WE DO NOT EXTEND THE OLD ONES IN WAYS THAT 
>MIGHT BREAK OLD USERS.
>
>Your point was exactly what?
>
>Btw, where did that 2TB limit number come from? Afaik, it should be 16TB 
>for a 4kB filesystem, no?
>  
>

Partition tables describe partitions in units of one sector.
2^(32+9) = 2T

To prevent integer overflow, we should use only 31 bits of a 32-bit integer.
2^(31+12) = 8T

There's _terrible_ hacks to really get to 16T.

-- qiyong

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [Ext2-devel] [RFC 0/13] extents and 48bit ext3
  2006-06-20  6:15           ` [Ext2-devel] " Qi Yong
@ 2006-06-20  8:26             ` Laurent Vivier
  2006-06-20  8:30               ` Jeff Garzik
  0 siblings, 1 reply; 104+ messages in thread
From: Laurent Vivier @ 2006-06-20  8:26 UTC (permalink / raw)
  To: Qi Yong
  Cc: Linus Torvalds, Andrew Morton, Jeff Garzik, Stephen C. Tweedie,
	ext2-devel@lists.sourceforge.net, linux-kernel, Mingming Cao,
	linux-fsdevel, alex, Andreas Dilger

[-- Attachment #1: Type: text/plain, Size: 1290 bytes --]

Qi Yong wrote:
> Linus Torvalds wrote:
> 
>> On Fri, 9 Jun 2006, Stephen C. Tweedie wrote:
>>  
>>
>>> When is the Linux syscall interface enough?  When should we just bump it
>>> and cut out all the compatibility interfaces?
>>>
>>> No, we don't; we let people configure certain obsolete bits out (a.out
>>> support etc), but we keep it in the tree despite the indirection cost to
>>> maintain multiple interfaces etc.
>>>    
>>>
>> Right. WE ADD NEW SYSTEM CALLS. WE DO NOT EXTEND THE OLD ONES IN WAYS THAT 
>> MIGHT BREAK OLD USERS.
>>
>> Your point was exactly what?
>>
>> Btw, where did that 2TB limit number come from? Afaik, it should be 16TB 
>> for a 4kB filesystem, no?
>>  
>>
> 
> Partition tables describe partitions in units of one sector.
> 2^(32+9) = 2T
> 
> To prevent integer overflow, we should use only 31 bits of a 32-bit integer.
> 2^(31+12) = 8T
> 
> There's _terrible_ hacks to really get to 16T.
> 
> -- qiyong
> 

IMHO, a simple solution is to use "Logical Volume Manager" instead of partition
manager: we create 64bit filesystem in a Logical Volume, not in a partition.

"partitioning is obsolete" ;-)

Regards,
Laurent

-- 
Laurent Vivier
Bull, Architect of an Open World (TM)
http://www.bullopensource.org/ext4


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [Ext2-devel] [RFC 0/13] extents and 48bit ext3
  2006-06-20  8:26             ` Laurent Vivier
@ 2006-06-20  8:30               ` Jeff Garzik
  2006-06-20  9:21                 ` Laurent Vivier
  0 siblings, 1 reply; 104+ messages in thread
From: Jeff Garzik @ 2006-06-20  8:30 UTC (permalink / raw)
  To: Laurent Vivier
  Cc: Qi Yong, Linus Torvalds, Andrew Morton, Stephen C. Tweedie,
	ext2-devel@lists.sourceforge.net, linux-kernel, Mingming Cao,
	linux-fsdevel, alex, Andreas Dilger

Laurent Vivier wrote:
> Qi Yong wrote:
>> Linus Torvalds wrote:
>>
>>> On Fri, 9 Jun 2006, Stephen C. Tweedie wrote:
>>>  
>>>
>>>> When is the Linux syscall interface enough?  When should we just bump it
>>>> and cut out all the compatibility interfaces?
>>>>
>>>> No, we don't; we let people configure certain obsolete bits out (a.out
>>>> support etc), but we keep it in the tree despite the indirection cost to
>>>> maintain multiple interfaces etc.
>>>>    
>>>>
>>> Right. WE ADD NEW SYSTEM CALLS. WE DO NOT EXTEND THE OLD ONES IN WAYS THAT 
>>> MIGHT BREAK OLD USERS.
>>>
>>> Your point was exactly what?
>>>
>>> Btw, where did that 2TB limit number come from? Afaik, it should be 16TB 
>>> for a 4kB filesystem, no?
>>>  
>>>
>> Partition tables describe partitions in units of one sector.
>> 2^(32+9) = 2T
>>
>> To prevent integer overflow, we should use only 31 bits of a 32-bit integer.
>> 2^(31+12) = 8T
>>
>> There's _terrible_ hacks to really get to 16T.
>>
>> -- qiyong
>>
> 
> IMHO, a simple solution is to use "Logical Volume Manager" instead of partition
> manager: we create 64bit filesystem in a Logical Volume, not in a partition.

That doesn't solve anything, if you are not using a 64bit filesystem.


> "partitioning is obsolete" ;-)

LVM is nothing but a partition manager...

	Jeff




^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [Ext2-devel] [RFC 0/13] extents and 48bit ext3
  2006-06-20  8:30               ` Jeff Garzik
@ 2006-06-20  9:21                 ` Laurent Vivier
  0 siblings, 0 replies; 104+ messages in thread
From: Laurent Vivier @ 2006-06-20  9:21 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Qi Yong, Linus Torvalds, Andrew Morton, Stephen C. Tweedie,
	ext2-devel@lists.sourceforge.net, linux-kernel, Mingming Cao,
	linux-fsdevel, alex, Andreas Dilger

[-- Attachment #1: Type: text/plain, Size: 2386 bytes --]

Jeff Garzik wrote:
> Laurent Vivier wrote:
>> Qi Yong wrote:
>>> Linus Torvalds wrote:
>>>
>>>> On Fri, 9 Jun 2006, Stephen C. Tweedie wrote:
>>>>  
>>>>
>>>>> When is the Linux syscall interface enough?  When should we just
>>>>> bump it
>>>>> and cut out all the compatibility interfaces?
>>>>>
>>>>> No, we don't; we let people configure certain obsolete bits out (a.out
>>>>> support etc), but we keep it in the tree despite the indirection
>>>>> cost to
>>>>> maintain multiple interfaces etc.
>>>>>   
>>>> Right. WE ADD NEW SYSTEM CALLS. WE DO NOT EXTEND THE OLD ONES IN
>>>> WAYS THAT MIGHT BREAK OLD USERS.
>>>>
>>>> Your point was exactly what?
>>>>
>>>> Btw, where did that 2TB limit number come from? Afaik, it should be
>>>> 16TB for a 4kB filesystem, no?
>>>>  
>>>>
>>> Partition tables describe partitions in units of one sector.
>>> 2^(32+9) = 2T
>>>
>>> To prevent integer overflow, we should use only 31 bits of a 32-bit
>>> integer.
>>> 2^(31+12) = 8T
>>>
>>> There's _terrible_ hacks to really get to 16T.
>>>
>>> -- qiyong
>>>
>>
>> IMHO, a simple solution is to use "Logical Volume Manager" instead of
>> partition
>> manager: we create 64bit filesystem in a Logical Volume, not in a
>> partition.
> 
> That doesn't solve anything, if you are not using a 64bit filesystem.

Sorry, I don't undestand why ???

You can use 32bit filesystem too, but you limit the size of the logical volume
to be compatible with the filesystem you use. LVM allows to create several 32bit
volumes on a big (> 8T) disk (if exists)

But if we think further, as biggest disk is 750 GB (and I think even using HW
RAID, there is an HW limit something like 4 TB), we can imagine a big Volume
Group belonging several Physical Volumes divided in Logical Volumes: so we
already use LVM, we don't need partition...)

>> "partitioning is obsolete" ;-)
> 
> LVM is nothing but a partition manager...

LVM is more than a partition manager:
- it is arch-independent
- it is 64bit compliant
- it can gather together several disks
- it is flexible (you can add/remove/resize volume)
- it is modern (doesn't have primary/extended partition, doesn't have limited
number of partition)

so... it's a volume manager.

Regards,
Laurent
-- 
Laurent Vivier
Bull, Architect of an Open World (TM)
http://www.bullopensource.org/ext4


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC 0/13] extents and 48bit ext3
  2006-06-09  1:20 Mingming Cao
  2006-06-09  2:40 ` Valdis.Kletnieks
  2006-06-09  2:49 ` Jeff Garzik
@ 2006-06-09  9:13 ` Christoph Hellwig
  2006-06-09 10:07   ` Andrew Morton
  2006-06-09 11:26   ` Alex Tomas
  2 siblings, 2 replies; 104+ messages in thread
From: Christoph Hellwig @ 2006-06-09  9:13 UTC (permalink / raw)
  To: Mingming Cao; +Cc: linux-kernel, ext2-devel, linux-fsdevel

On Thu, Jun 08, 2006 at 06:20:54PM -0700, Mingming Cao wrote:
> Current ext3 filesystem is limited to 8TB(4k block size), this is
> practically not enough for the increasing need of bigger storage as
> disks in a few years (or even now).
> 
> To address this need, there are co-effort from RedHat, ClusterFS, IBM
> and BULL to move ext3 from 32 bit filesystem to 48 bit filesystem,
> expanding ext3 filesystem limit from 8TB today to 1024 PB. The 48 bit
> ext3 is build on top of extent map changes for ext3, originally from
> Alex Tomas. In short, the new ext3 on-disk extents format is:

What a horrible idea!  The nice things about ext3 are:

 - the rather simple and thus reliable implementation
 - the lack of incompatible ondisk changes

and the block numbers are't the big problem concerning scalability, there's
a lot more to it, like btree(-like) structures in the allocator, parallel
alloocator algorithms and a better allocation group concept.

If you guys want big storage on linux please help improving the filesystems
design for that, e.g. jfs or xfs instead of showhorning it onto ext3 thus
both making ext3 less reliable for us desktop/small server users and not get
the full thing for the big storage people either.

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC 0/13] extents and 48bit ext3
  2006-06-09  9:13 ` Christoph Hellwig
@ 2006-06-09 10:07   ` Andrew Morton
  2006-06-09 15:40     ` Jeff Garzik
  2006-06-09 11:26   ` Alex Tomas
  1 sibling, 1 reply; 104+ messages in thread
From: Andrew Morton @ 2006-06-09 10:07 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-fsdevel, ext2-devel, cmm, linux-kernel

On Fri, 9 Jun 2006 10:13:27 +0100
Christoph Hellwig <hch@infradead.org> wrote:

> On Thu, Jun 08, 2006 at 06:20:54PM -0700, Mingming Cao wrote:
> > Current ext3 filesystem is limited to 8TB(4k block size), this is
> > practically not enough for the increasing need of bigger storage as
> > disks in a few years (or even now).
> > 
> > To address this need, there are co-effort from RedHat, ClusterFS, IBM
> > and BULL to move ext3 from 32 bit filesystem to 48 bit filesystem,
> > expanding ext3 filesystem limit from 8TB today to 1024 PB. The 48 bit
> > ext3 is build on top of extent map changes for ext3, originally from
> > Alex Tomas. In short, the new ext3 on-disk extents format is:
> 
> What a horrible idea!  The nice things about ext3 are:
> 
>  - the rather simple and thus reliable implementation

JBD isn't simple.  I don't think there's a need in this project to make
algorithmic changes in either JBD or htree, thankfully.

>  - the lack of incompatible ondisk changes

Ted&co have been pretty good at avoiding compatibility problems.

> and the block numbers are't the big problem concerning scalability, there's
> a lot more to it, like btree(-like) structures in the allocator, parallel
> alloocator algorithms and a better allocation group concept.

The performance testing results I've seen for a few of the components of
this project have been rather good, and that's the bottom line.

I don't know how the end result would compare in a bakeoff against XFS, and
I doubt if we know how much XFS performance would be improved if this
effort were diverted into that project.

But I don't think it's all as clear-cut as you imply.

> If you guys want big storage on linux please help improving the filesystems
> design for that, e.g. jfs or xfs instead of showhorning it onto ext3 thus
> both making ext3 less reliable for us desktop/small server users and not get
> the full thing for the big storage people either.

There have been pretty big changes in ext3 post-2.6.early and we've been OK
at avoiding breakage thus far.  It all comes down to how well the new
codepaths manage to avoid altering the existing ones.

That being said, ext3 isn't exactly ....  modern.  One day we'll need
something better.

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC 0/13] extents and 48bit ext3
  2006-06-09 10:07   ` Andrew Morton
@ 2006-06-09 15:40     ` Jeff Garzik
  2006-06-09 16:56       ` Andrew Morton
  0 siblings, 1 reply; 104+ messages in thread
From: Jeff Garzik @ 2006-06-09 15:40 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Christoph Hellwig, cmm, linux-kernel, ext2-devel, linux-fsdevel

Andrew Morton wrote:
> Ted&co have been pretty good at avoiding compatibility problems.

Well, extents and 48bit make that track record demonstrably worse.

Users are now forced to remember that, if they write to their filesystem 
after using either $mmver or $korgver kernels, they are locked out of 
using older kernels.

 From the user's perspective, ext3 has no clear "metadata version 1", 
"metadata version 2" division.  Thus they are now forced to keep a 
matrix of kernel versions and ext3 feature flag support, to know which 
kernels are usable with which data.  It is a support nightmare.

At no point is a user ever told, in big capital letters, "IF YOU WRITE 
TO THIS FILESYSTEM, YOU CAN'T BOOT OLDER KERNELS."  There is no "click 
OK to continue with this dramatic event."

And as features continue to be added in this manner, this problem gets 
_exponentially_ worse.

On the project management side of things, I see no indication that this 
momentum slow -- which implies to me that people will keep slapping new 
stuff into ext3, rather than directing energy towards a newer, cleaner 
ext-NG filesystem.

Dragging around back-compat really constrains freedom, and you have to 
have some sort of "pressure relief valve" (a massive, wildly 
incompatible update) eventually.

In my mind, it's analagous to locking developers into developing and 
deploying new features into a stable branch of software.  The hacks just 
get worse and worse, as you bend over backwards for back-compat.

	Jeff

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC 0/13] extents and 48bit ext3
  2006-06-09 15:40     ` Jeff Garzik
@ 2006-06-09 16:56       ` Andrew Morton
  2006-06-09 17:07         ` Jeff Garzik
  0 siblings, 1 reply; 104+ messages in thread
From: Andrew Morton @ 2006-06-09 16:56 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: hch, cmm, linux-kernel, ext2-devel, linux-fsdevel

On Fri, 09 Jun 2006 11:40:03 -0400
Jeff Garzik <jeff@garzik.org> wrote:

> Users are now forced to remember that, if they write to their filesystem 
> after using either $mmver or $korgver kernels, they are locked out of 
> using older kernels.

The same happens if we create ext4 - earlier kernels don't support that,
either.

I suppose we could call it ext4, although that wouldn't make much
difference operationally.  The developers would probably choose to generate
ext4 from the same codebase as ext3 for maintainability reasons, rather
than choosing to copy-n-modify.  We'd need to see the patches to be able to
finally make that judgement.

> 
> And as features continue to be added in this manner, this problem gets 
> _exponentially_ worse.

"continue to be added"?  afaik this is the first time this has happened,
and there's no plan to do it again.

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC 0/13] extents and 48bit ext3
  2006-06-09 16:56       ` Andrew Morton
@ 2006-06-09 17:07         ` Jeff Garzik
  2006-06-09 17:35           ` Andrew Morton
  0 siblings, 1 reply; 104+ messages in thread
From: Jeff Garzik @ 2006-06-09 17:07 UTC (permalink / raw)
  To: Andrew Morton; +Cc: hch, cmm, linux-kernel, ext2-devel, linux-fsdevel

Andrew Morton wrote:
> On Fri, 09 Jun 2006 11:40:03 -0400
> Jeff Garzik <jeff@garzik.org> wrote:
> 
>> Users are now forced to remember that, if they write to their filesystem 
>> after using either $mmver or $korgver kernels, they are locked out of 
>> using older kernels.
> 
> The same happens if we create ext4 - earlier kernels don't support that,
> either.
> 
> I suppose we could call it ext4, although that wouldn't make much
> difference operationally.  The developers would probably choose to generate
> ext4 from the same codebase as ext3 for maintainability reasons, rather
> than choosing to copy-n-modify.  We'd need to see the patches to be able to
> finally make that judgement.

I would propose the obvious...  'cp -a ext3 ext4', apply the extent and 
48bit patches, and then do the obvious search-n-replace.

I guarantee that developer momentum would take over from there.  Rather 
than fundamentally change ext3, let's let it stabilize.

>> And as features continue to be added in this manner, this problem gets 
>> _exponentially_ worse.
> 
> "continue to be added"?  afaik this is the first time this has happened,
> and there's no plan to do it again.

ext3 developers are _fundamentally changing_ the block allocation 
structure [in a good way].  If they can get away with it once, they will 
continue to modify ext3, adding btrees and other new gadgets.  That's 
just human nature.  For example, htree was a minor disaster, 
deployment-wise, on the distro vendor side.

I think extents and 48bit are so fundamental that it's silly to attempt 
to minimize the impact from the user's perspective, and moreover, I 
think Linux benefits more if ext3 is _not_ kept on life support this way.

We need to draw a line in the sand.  If we don't, no one ever will.

	Jeff

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC 0/13] extents and 48bit ext3
  2006-06-09 17:07         ` Jeff Garzik
@ 2006-06-09 17:35           ` Andrew Morton
  2006-06-09 17:48             ` Jeff Garzik
  0 siblings, 1 reply; 104+ messages in thread
From: Andrew Morton @ 2006-06-09 17:35 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: hch, linux-fsdevel, ext2-devel, cmm, linux-kernel

On Fri, 09 Jun 2006 13:07:37 -0400
Jeff Garzik <jeff@garzik.org> wrote:

> I would propose the obvious...  'cp -a ext3 ext4', apply the extent and 
> 48bit patches, and then do the obvious search-n-replace.

Most of ext3 is JBD.  At least, in terms of complexity.  And I don't think
there's anything in this proposal which affects JBD, apart from changing
the blocksize.

Cloning JBD for this exercise would, I suspect, be the wrong thing to do -
the two clones would be pretty much identical, apart from some scalar
types.

I did suggest a couple of years ago that we should clone the ext3 part and
have both ext3 and ext4 use the same JBD layer - I don't know what happened
to that idea.

There has been steady, cautious but significant improvement happening in
ext3 over the past few years.  I'd expect that to continue, although
perhaps at a lower rate.  Having to apply the same changes to two
filesystems would be an obvious loss.

It comes down to looking at the patches, and I haven't done that in quite
some time.  Ideally the new functionality would all be under CONFIG_foo,
but I do not know if that is being proposed here?

> We need to draw a line in the sand.  If we don't, no one ever will.

You speak as if this is something which has happened before, or that it will
happen again.

All that being said, Linux's filesystems are looking increasingly crufty
and we are getting to the time where we would benefit from a greenfield
start-a-new-one.  That new one might even be based on reiser4 - has anyone
looked?  It's been sitting around for a couple of years.

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC 0/13] extents and 48bit ext3
  2006-06-09 17:35           ` Andrew Morton
@ 2006-06-09 17:48             ` Jeff Garzik
  2006-06-09 17:59               ` Jeff Garzik
  0 siblings, 1 reply; 104+ messages in thread
From: Jeff Garzik @ 2006-06-09 17:48 UTC (permalink / raw)
  To: Andrew Morton; +Cc: hch, linux-fsdevel, ext2-devel, cmm, linux-kernel

Andrew Morton wrote:
> On Fri, 09 Jun 2006 13:07:37 -0400
> Jeff Garzik <jeff@garzik.org> wrote:
> 
>> I would propose the obvious...  'cp -a ext3 ext4', apply the extent and 
>> 48bit patches, and then do the obvious search-n-replace.
> 
> Most of ext3 is JBD.  At least, in terms of complexity.  And I don't think
> there's anything in this proposal which affects JBD, apart from changing
> the blocksize.
> 
> Cloning JBD for this exercise would, I suspect, be the wrong thing to do -
> the two clones would be pretty much identical, apart from some scalar
> types.
> 
> I did suggest a couple of years ago that we should clone the ext3 part and
> have both ext3 and ext4 use the same JBD layer - I don't know what happened
> to that idea.

The JBD API is reasonably distinct, so IMO this would be a logical next 
step.  I would hope they could use the same JBD, so, I strongly agree...


> There has been steady, cautious but significant improvement happening in
> ext3 over the past few years.  I'd expect that to continue, although
> perhaps at a lower rate.  Having to apply the same changes to two
> filesystems would be an obvious loss.

I disagree completely...  it would be an obvious win:  people who want 
stability get that, people who want new features get that too.


> It comes down to looking at the patches, and I haven't done that in quite
> some time.  Ideally the new functionality would all be under CONFIG_foo,
> but I do not know if that is being proposed here?
> 
>> We need to draw a line in the sand.  If we don't, no one ever will.
> 
> You speak as if this is something which has happened before, or that it will
> happen again.
> 
> All that being said, Linux's filesystems are looking increasingly crufty
> and we are getting to the time where we would benefit from a greenfield
> start-a-new-one.  That new one might even be based on reiser4 - has anyone
> looked?  It's been sitting around for a couple of years.

reiser4 actually has this same problem, but worse.  It has pluggable 
metadata even to the point of supporting plugin-style metadata development.

If we can successfully devolve a filesystem to metadata and algorithm 
plugins, that should be done at the VFS level, and not called "reiser4".

But in the absence of a different VFS API, I think it is the most 
practical of all the options to open the floodgates to ext4 rather than 
ext3.

	Jeff

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC 0/13] extents and 48bit ext3
  2006-06-09 17:48             ` Jeff Garzik
@ 2006-06-09 17:59               ` Jeff Garzik
  2006-06-09 18:27                 ` [Ext2-devel] " Mike Snitzer
  0 siblings, 1 reply; 104+ messages in thread
From: Jeff Garzik @ 2006-06-09 17:59 UTC (permalink / raw)
  To: Andrew Morton; +Cc: hch, linux-fsdevel, ext2-devel, cmm, linux-kernel

Jeff Garzik wrote:
> I disagree completely...  it would be an obvious win:  people who want 
> stability get that, people who want new features get that too.

And developers have a better outlet for their wacky developmental urges...

	Jeff

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [Ext2-devel] [RFC 0/13] extents and 48bit ext3
  2006-06-09 17:59               ` Jeff Garzik
@ 2006-06-09 18:27                 ` Mike Snitzer
  2006-06-09 18:54                   ` Jeff Garzik
  2006-06-10 13:49                   ` Adrian Bunk
  0 siblings, 2 replies; 104+ messages in thread
From: Mike Snitzer @ 2006-06-09 18:27 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Andrew Morton, hch, linux-fsdevel, ext2-devel, cmm, linux-kernel

On 6/9/06, Jeff Garzik <jeff@garzik.org> wrote:
> Jeff Garzik wrote:
> > I disagree completely...  it would be an obvious win:  people who want
> > stability get that, people who want new features get that too.
>
> And developers have a better outlet for their wacky developmental urges...

And no real-world near-term progress is made for production users with
modern requirements. What you're advocating breeds instability in the
near-term.

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [Ext2-devel] [RFC 0/13] extents and 48bit ext3
  2006-06-09 18:27                 ` [Ext2-devel] " Mike Snitzer
@ 2006-06-09 18:54                   ` Jeff Garzik
  2006-06-09 19:22                     ` Alex Tomas
  2006-06-10 13:49                   ` Adrian Bunk
  1 sibling, 1 reply; 104+ messages in thread
From: Jeff Garzik @ 2006-06-09 18:54 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Andrew Morton, hch, linux-fsdevel, ext2-devel, cmm, linux-kernel

Mike Snitzer wrote:
> On 6/9/06, Jeff Garzik <jeff@garzik.org> wrote:
>> Jeff Garzik wrote:
>> > I disagree completely...  it would be an obvious win:  people who want
>> > stability get that, people who want new features get that too.
>>
>> And developers have a better outlet for their wacky developmental 
>> urges...
> 
> And no real-world near-term progress is made for production users with
> modern requirements. What you're advocating breeds instability in the
> near-term.

Constantly patching the main, "stable" Linux filesystem breeds 
instability today.

	Jeff




^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC 0/13] extents and 48bit ext3
  2006-06-09 18:54                   ` Jeff Garzik
@ 2006-06-09 19:22                     ` Alex Tomas
  2006-06-09 22:49                       ` Valdis.Kletnieks
  0 siblings, 1 reply; 104+ messages in thread
From: Alex Tomas @ 2006-06-09 19:22 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Andrew Morton, ext2-devel, linux-kernel, hch, cmm, linux-fsdevel


what if proposed patch is safer than an average fix?
(given that it's just out of usage unless enabled)

thanks, Alex

>>>>> Jeff Garzik (JG) writes:

 JG> Mike Snitzer wrote:
 >> On 6/9/06, Jeff Garzik <jeff@garzik.org> wrote:
 >>> Jeff Garzik wrote:
 >>> > I disagree completely...  it would be an obvious win:  people who want
 >>> > stability get that, people who want new features get that too.
 >>> 
 >>> And developers have a better outlet for their wacky developmental 
 >>> urges...
 >> 
 >> And no real-world near-term progress is made for production users with
 >> modern requirements. What you're advocating breeds instability in the
 >> near-term.

 JG> Constantly patching the main, "stable" Linux filesystem breeds 
 JG> instability today.

 JG> 	Jeff





 JG> _______________________________________________
 JG> Ext2-devel mailing list
 JG> Ext2-devel@lists.sourceforge.net
 JG> https://lists.sourceforge.net/lists/listinfo/ext2-devel

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC 0/13] extents and 48bit ext3
  2006-06-09 19:22                     ` Alex Tomas
@ 2006-06-09 22:49                       ` Valdis.Kletnieks
  2006-06-09 23:34                         ` [Ext2-devel] " Andreas Dilger
  0 siblings, 1 reply; 104+ messages in thread
From: Valdis.Kletnieks @ 2006-06-09 22:49 UTC (permalink / raw)
  To: Alex Tomas
  Cc: Andrew Morton, Jeff Garzik, ext2-devel, linux-kernel, hch, cmm,
	linux-fsdevel


[-- Attachment #1.1: Type: text/plain, Size: 337 bytes --]

On Fri, 09 Jun 2006 23:22:23 +0400, Alex Tomas said:
> what if proposed patch is safer than an average fix?
> (given that it's just out of usage unless enabled)

Those are the *dangerous* patches, because they usually contain bugs
that weren't tripped over by the 6 people who enabled it while it
was bouncing around in the -mm tree....

[-- Attachment #1.2: Type: application/pgp-signature, Size: 226 bytes --]

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



[-- Attachment #3: Type: text/plain, Size: 161 bytes --]

_______________________________________________
Ext2-devel mailing list
Ext2-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ext2-devel

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [Ext2-devel] [RFC 0/13] extents and 48bit ext3
  2006-06-09 22:49                       ` Valdis.Kletnieks
@ 2006-06-09 23:34                         ` Andreas Dilger
  0 siblings, 0 replies; 104+ messages in thread
From: Andreas Dilger @ 2006-06-09 23:34 UTC (permalink / raw)
  To: Valdis.Kletnieks
  Cc: Alex Tomas, Andrew Morton, Jeff Garzik, ext2-devel, linux-kernel,
	hch, cmm, linux-fsdevel

On Jun 09, 2006  18:49 -0400, Valdis.Kletnieks@vt.edu wrote:
> On Fri, 09 Jun 2006 23:22:23 +0400, Alex Tomas said:
> > what if proposed patch is safer than an average fix?
> > (given that it's just out of usage unless enabled)
> 
> Those are the *dangerous* patches, because they usually contain bugs
> that weren't tripped over by the 6 people who enabled it while it
> was bouncing around in the -mm tree....

Umm, in case you didn't know, the extent patch which is the primary issue
of discussion here (not the whole 64-bit clean changes though) were run
for MILLIONS of hours under very high IO load on the largest computer
systems in the world for the last year or so.  It is easy to get millions
of hours of usage if there are thousands of servers running this code...

Yes, I have no doubt there will be bugs in the code because the usage
pattern is different for different environments, but we aren't advocating
the inclusion of something major like this that was just written
yesterday in someone's basement.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [Ext2-devel] [RFC 0/13] extents and 48bit ext3
  2006-06-09 18:27                 ` [Ext2-devel] " Mike Snitzer
  2006-06-09 18:54                   ` Jeff Garzik
@ 2006-06-10 13:49                   ` Adrian Bunk
  2006-06-10 13:51                     ` Christoph Hellwig
  1 sibling, 1 reply; 104+ messages in thread
From: Adrian Bunk @ 2006-06-10 13:49 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Jeff Garzik, Andrew Morton, hch, linux-fsdevel, ext2-devel, cmm,
	linux-kernel

On Fri, Jun 09, 2006 at 02:27:53PM -0400, Mike Snitzer wrote:
> On 6/9/06, Jeff Garzik <jeff@garzik.org> wrote:
> >Jeff Garzik wrote:
> >> I disagree completely...  it would be an obvious win:  people who want
> >> stability get that, people who want new features get that too.
> >
> >And developers have a better outlet for their wacky developmental urges...
> 
> And no real-world near-term progress is made for production users with
> modern requirements. What you're advocating breeds instability in the
> near-term.

There's also the old-fashioned "no regressions" requirement.

You are trading near-term instability for the few users with "modern 
requirements" against possible regressions for a large userbase.

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [Ext2-devel] [RFC 0/13] extents and 48bit ext3
  2006-06-10 13:49                   ` Adrian Bunk
@ 2006-06-10 13:51                     ` Christoph Hellwig
  2006-06-10 14:54                       ` Jeff Garzik
  0 siblings, 1 reply; 104+ messages in thread
From: Christoph Hellwig @ 2006-06-10 13:51 UTC (permalink / raw)
  To: Adrian Bunk
  Cc: Mike Snitzer, Jeff Garzik, Andrew Morton, hch, linux-fsdevel,
	ext2-devel, cmm, linux-kernel

On Sat, Jun 10, 2006 at 03:49:46PM +0200, Adrian Bunk wrote:
> > And no real-world near-term progress is made for production users with
> > modern requirements. What you're advocating breeds instability in the
> > near-term.
> 
> There's also the old-fashioned "no regressions" requirement.
> 
> You are trading near-term instability for the few users with "modern 
> requirements" against possible regressions for a large userbase.

Alex mentioned a few times that the extents code just adds three if.
I'm pretty sure that will not give you any regressions in the existing
codebase.  Can we concentrate on the more useful discussion topics now?


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC 0/13] extents and 48bit ext3
  2006-06-10 13:51                     ` Christoph Hellwig
@ 2006-06-10 14:54                       ` Jeff Garzik
  2006-06-10 18:01                         ` [Ext2-devel] " Andreas Dilger
  0 siblings, 1 reply; 104+ messages in thread
From: Jeff Garzik @ 2006-06-10 14:54 UTC (permalink / raw)
  To: Christoph Hellwig, Andrew Morton
  Cc: ext2-devel, linux-kernel, cmm, linux-fsdevel, Adrian Bunk

Christoph Hellwig wrote:
> On Sat, Jun 10, 2006 at 03:49:46PM +0200, Adrian Bunk wrote:
>>> And no real-world near-term progress is made for production users with
>>> modern requirements. What you're advocating breeds instability in the
>>> near-term.
>> There's also the old-fashioned "no regressions" requirement.
>>
>> You are trading near-term instability for the few users with "modern 
>> requirements" against possible regressions for a large userbase.
> 
> Alex mentioned a few times that the extents code just adds three if.
> I'm pretty sure that will not give you any regressions in the existing
> codebase.  Can we concentrate on the more useful discussion topics now?

Alex is off by an order of magnitude.  I've re-read the 13-patch series, 
and this is the result of the review:

There are _five_ "if (new) .. else .." constructs added in JBD alone.

Three added in extent map support.

Twenty-seven (27) such constructs in 48-bit physical block support.

Two more in 48-bit ACL support.

And finally, the superblock changes don't add any branches, like the 
other code does, but it does double the endian conversion work that 
-every- user must do, even if they don't use 48bit at all.

	Jeff

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [Ext2-devel] [RFC 0/13] extents and 48bit ext3
  2006-06-10 14:54                       ` Jeff Garzik
@ 2006-06-10 18:01                         ` Andreas Dilger
  0 siblings, 0 replies; 104+ messages in thread
From: Andreas Dilger @ 2006-06-10 18:01 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Christoph Hellwig, Andrew Morton, Adrian Bunk, Mike Snitzer,
	linux-fsdevel, ext2-devel, cmm, linux-kernel

On Jun 10, 2006  10:54 -0400, Jeff Garzik wrote:
> Christoph Hellwig wrote:
> >Alex mentioned a few times that the extents code just adds three if.
> >I'm pretty sure that will not give you any regressions in the existing
> >codebase.  Can we concentrate on the more useful discussion topics now?
> 
> Alex is off by an order of magnitude.  I've re-read the 13-patch series, 
> and this is the result of the review:

Thanks for at least looking at the code, which was the intention of posting
the patches...  It caused quite a few more ruffled feathers than we expected.

> Three added in extent map support.

As Christoph quoted Alex, "the extents code", which you confirm is 3 "ifs".

> There are _five_ "if (new) .. else .." constructs added in JBD alone.

Actually, 64-bit support in the JBD code was written by Zach Brown
for OCFS, so I think they want this patch into the kernel regardless.
It's relatively simple change though - all conditional on a single flag.

> Twenty-seven (27) such constructs in 48-bit physical block support.

Though there are really only 2 conditionals (in macros, one for read and
one for write) that are used everywhere, so it's not as bad as it seems.

> Two more in 48-bit ACL support.
> 
> And finally, the superblock changes don't add any branches, like the 
> other code does, but it does double the endian conversion work that 
> -every- user must do, even if they don't use 48bit at all.

These are all related to 48-bit filesystem support, not strictly
extents.  Much of the 48-bit code is dependent upon CONFIG_LBD or
sizeof(ext3_fsblk_t), so if people have no desire to use large (2TB+) or
larger (16TB+) filesystems these conditionals disappear at compile time.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC 0/13] extents and 48bit ext3
  2006-06-09  9:13 ` Christoph Hellwig
  2006-06-09 10:07   ` Andrew Morton
@ 2006-06-09 11:26   ` Alex Tomas
  2006-06-09 14:23     ` [Ext2-devel] " Jeff Garzik
  1 sibling, 1 reply; 104+ messages in thread
From: Alex Tomas @ 2006-06-09 11:26 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-fsdevel, ext2-devel, Mingming Cao, linux-kernel

>>>>> Christoph Hellwig (CH) writes:

 CH> If you guys want big storage on linux please help improving the filesystems
 CH> design for that, e.g. jfs or xfs instead of showhorning it onto ext3 thus
 CH> both making ext3 less reliable for us desktop/small server users and not get
 CH> the full thing for the big storage people either.

proposed patches don't touch existing code paths.
extents may be enabled/disabled on per-file basis.

thanks, Alex

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [Ext2-devel] [RFC 0/13] extents and 48bit ext3
  2006-06-09 11:26   ` Alex Tomas
@ 2006-06-09 14:23     ` Jeff Garzik
  2006-06-09 14:33       ` Alex Tomas
  2006-06-09 14:34       ` Alex Tomas
  0 siblings, 2 replies; 104+ messages in thread
From: Jeff Garzik @ 2006-06-09 14:23 UTC (permalink / raw)
  To: Alex Tomas
  Cc: Christoph Hellwig, Mingming Cao, linux-kernel, ext2-devel,
	linux-fsdevel

Alex Tomas wrote:
>>>>>> Christoph Hellwig (CH) writes:
> 
>  CH> If you guys want big storage on linux please help improving the filesystems
>  CH> design for that, e.g. jfs or xfs instead of showhorning it onto ext3 thus
>  CH> both making ext3 less reliable for us desktop/small server users and not get
>  CH> the full thing for the big storage people either.
> 
> proposed patches don't touch existing code paths.
> extents may be enabled/disabled on per-file basis.

And thus, inodes are progressively incompatible with older kernels. 
Boot into an older kernel, and you can now only read half your 
filesystem (if it even allows mount at all).

	Jeff




^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [Ext2-devel] [RFC 0/13] extents and 48bit ext3
  2006-06-09 14:23     ` [Ext2-devel] " Jeff Garzik
@ 2006-06-09 14:33       ` Alex Tomas
  2006-06-09 14:34       ` Alex Tomas
  1 sibling, 0 replies; 104+ messages in thread
From: Alex Tomas @ 2006-06-09 14:33 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Alex Tomas, Christoph Hellwig, Mingming Cao, linux-kernel,
	ext2-devel, linux-fsdevel

>>>>> Jeff Garzik (JG) writes:

 JG> And thus, inodes are progressively incompatible with older
 JG> kernels. Boot into an older kernel, and you can now only read half
 JG> your filesystem (if it even allows mount at all).

nope, you aren't allowed to mount fs with extents-enabled files
by ext3 which has no the feature compiled in. the same will
happen if you call it ext4.

thanks, Alex

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC 0/13] extents and 48bit ext3
  2006-06-09 14:23     ` [Ext2-devel] " Jeff Garzik
  2006-06-09 14:33       ` Alex Tomas
@ 2006-06-09 14:34       ` Alex Tomas
  2006-06-09 14:35         ` Jeff Garzik
  1 sibling, 1 reply; 104+ messages in thread
From: Alex Tomas @ 2006-06-09 14:34 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: ext2-devel, linux-kernel, Christoph Hellwig, Mingming Cao,
	linux-fsdevel, Alex Tomas

>>>>> Jeff Garzik (JG) writes:

 JG> And thus, inodes are progressively incompatible with older
 JG> kernels. Boot into an older kernel, and you can now only read half
 JG> your filesystem (if it even allows mount at all).

nope, you aren't allowed to mount fs with extents-enabled files
by ext3 which has no the feature compiled in. the same will
happen if you call it ext4.

thanks, Alex

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC 0/13] extents and 48bit ext3
  2006-06-09 14:34       ` Alex Tomas
@ 2006-06-09 14:35         ` Jeff Garzik
  2006-06-09 14:57           ` Alex Tomas
  0 siblings, 1 reply; 104+ messages in thread
From: Jeff Garzik @ 2006-06-09 14:35 UTC (permalink / raw)
  To: Alex Tomas
  Cc: Christoph Hellwig, linux-fsdevel, ext2-devel, Mingming Cao,
	linux-kernel

Alex Tomas wrote:
>>>>>> Jeff Garzik (JG) writes:
> 
>  JG> And thus, inodes are progressively incompatible with older
>  JG> kernels. Boot into an older kernel, and you can now only read half
>  JG> your filesystem (if it even allows mount at all).
> 
> nope, you aren't allowed to mount fs with extents-enabled files
> by ext3 which has no the feature compiled in. the same will
> happen if you call it ext4.

This is my point...  why increase user confusion by calling it ext3, then?

Extent magnify the "what ext3 filesystem am I talking to, today?" problem.

	Jeff

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC 0/13] extents and 48bit ext3
  2006-06-09 14:35         ` Jeff Garzik
@ 2006-06-09 14:57           ` Alex Tomas
  2006-06-09 15:17             ` [Ext2-devel] " Jeff Garzik
  0 siblings, 1 reply; 104+ messages in thread
From: Alex Tomas @ 2006-06-09 14:57 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: ext2-devel, linux-kernel, Christoph Hellwig, Mingming Cao,
	linux-fsdevel, Alex Tomas

>>>>> Jeff Garzik (JG) writes:

 JG> Alex Tomas wrote:
 >>>>>>> Jeff Garzik (JG) writes:
 JG> And thus, inodes are progressively incompatible with older
 JG> kernels. Boot into an older kernel, and you can now only read half
 JG> your filesystem (if it even allows mount at all).
 >> nope, you aren't allowed to mount fs with extents-enabled files
 >> by ext3 which has no the feature compiled in. the same will
 >> happen if you call it ext4.

 JG> This is my point...  why increase user confusion by calling it ext3, then?

by default it's still old good ext3 without extents. user should
enable it explicitly. for him, this means the feature is ready
to be used anytime. the only thing he needs is to (re)mount fs
with the option. for us, this means: a) a single source tree -
easy to maintain b) we must be clear with user that the feature
isn't backward compatible

thanks, Alex

PS. in the end this is just ext3 with one more feature ...

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [Ext2-devel] [RFC 0/13] extents and 48bit ext3
  2006-06-09 14:57           ` Alex Tomas
@ 2006-06-09 15:17             ` Jeff Garzik
  2006-06-09 16:21               ` Mike Snitzer
  2006-06-09 16:56               ` Andreas Dilger
  0 siblings, 2 replies; 104+ messages in thread
From: Jeff Garzik @ 2006-06-09 15:17 UTC (permalink / raw)
  To: Alex Tomas
  Cc: Christoph Hellwig, Mingming Cao, linux-kernel, ext2-devel,
	linux-fsdevel

Alex Tomas wrote:
> PS. in the end this is just ext3 with one more feature ...

Incorrect.  You have to look at ext3 development over time.  This is a 
PATTERN with ext3 development:  mutating the metadata over time in a 
progressively incompatible manner.

You have this thing called "ext3", which fools an admin into thinking 
they can use their filesystem with any kernel that has "ext3" support. 
That's somewhat true today, but with extents it will become false. 
Having a mutating definition of "ext3" is a convenience for developers, 
and for users WHO ONLY MOVE FORWARD in kernel versions.

A 48bit ext3 filesystem with extents is completely unusable in 2.4.30's 
"ext3" or 2.6.10's "ext3".  Users are forced to hunt down the specific 
kernel version when an incompatible feature was added to ext3.  How can 
that possibly be described as "user friendly"?

"Which ext3 am I talking to, today?"
"And which kernels am I locked into, in order to talk to my filesystem?"

Not all users are big production houses that plan their filesystem 
metadata migration months in advance!  I _guarantee_ some users will 
boot into ext3-with-extents, use it for a while, and then try to 
downgrade for whatever reason...  only to find they have been LOCKED 
OUT.  That is a very real world situation, guys.

	Jeff

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [Ext2-devel] [RFC 0/13] extents and 48bit ext3
  2006-06-09 15:17             ` [Ext2-devel] " Jeff Garzik
@ 2006-06-09 16:21               ` Mike Snitzer
  2006-06-09 16:27                 ` Jeff Garzik
  2006-06-09 16:33                 ` Alex Tomas
  2006-06-09 16:56               ` Andreas Dilger
  1 sibling, 2 replies; 104+ messages in thread
From: Mike Snitzer @ 2006-06-09 16:21 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Alex Tomas, Christoph Hellwig, Mingming Cao, linux-kernel,
	ext2-devel, linux-fsdevel

On 6/9/06, Jeff Garzik <jeff@garzik.org> wrote:
> Alex Tomas wrote:
> > PS. in the end this is just ext3 with one more feature ...
>
> Incorrect.  You have to look at ext3 development over time.  This is a
> PATTERN with ext3 development:  mutating the metadata over time in a
> progressively incompatible manner.
>
> You have this thing called "ext3", which fools an admin into thinking
> they can use their filesystem with any kernel that has "ext3" support.
> That's somewhat true today, but with extents it will become false.
> Having a mutating definition of "ext3" is a convenience for developers,
> and for users WHO ONLY MOVE FORWARD in kernel versions.
>
> A 48bit ext3 filesystem with extents is completely unusable in 2.4.30's
> "ext3" or 2.6.10's "ext3".  Users are forced to hunt down the specific
> kernel version when an incompatible feature was added to ext3.  How can
> that possibly be described as "user friendly"?
>
> "Which ext3 am I talking to, today?"
> "And which kernels am I locked into, in order to talk to my filesystem?"
>
> Not all users are big production houses that plan their filesystem
> metadata migration months in advance!  I _guarantee_ some users will
> boot into ext3-with-extents, use it for a while, and then try to
> downgrade for whatever reason...  only to find they have been LOCKED
> OUT.  That is a very real world situation, guys.

Jeff,

I think all of us do understand what you're saying and on some level
are willing to accept that ext3-with-extents is in fact worthy of
branching to ext4, hence the url that has hosted the development of
extents (mballoc, delalloc, 48bit etc):
http://www.bullopensource.org/ext4/

But it _seems_ you're trying to paint ALL the ext3-developers as a
narrow minded lot.  If and when users decide to enable ext3 extents on
their filesystems they will presumably understand that doing so
precludes their ability to boot older kernels (steps can be taken to
make them well aware of this). The "real world situation" you refer
to, while hypothetically valid, isn't something informed
ext3-with-extents users will _ever_ elect to do.

Once a compelling feature is introduced Linux users embrace it and
never look back (provided it is stable!).  The real risk is the
(in)stability of all these ext3 improvements.  Stability is obviously
a requirement for merging these changes but I for one find it
refreshing that the current desire is to merge extents with ext3
(implicitly speaks to its stability when you couple that desire with
the fact that so many ext3 stakeholders are onboard!).

And as an aside, merging extents with ext3 forces ext3-developers to
be somewhat conservative about what bells and whistles they'd be
introducing moving forward.  The worst thing would be for these ext3
improvements to get merged into a new ext4 that becomes wildly known
as "the experimental ext3++".  I suppose developer discipline would
prevent such an unfortunate distinction but a new ext4 sandbox _could_
open the flood gates.

Developers never _want_ to branch (maintenance-hell), the question
becomes: do the risks associated with ext3-with-extents' backword
incompatibility _really_ justify the branch?

Mike

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [Ext2-devel] [RFC 0/13] extents and 48bit ext3
  2006-06-09 16:21               ` Mike Snitzer
@ 2006-06-09 16:27                 ` Jeff Garzik
  2006-06-09 16:48                   ` Alex Tomas
  2006-06-09 16:33                 ` Alex Tomas
  1 sibling, 1 reply; 104+ messages in thread
From: Jeff Garzik @ 2006-06-09 16:27 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Alex Tomas, Christoph Hellwig, Mingming Cao, linux-kernel,
	ext2-devel, linux-fsdevel

Mike Snitzer wrote:
> Developers never _want_ to branch (maintenance-hell), the question
> becomes: do the risks associated with ext3-with-extents' backword
> incompatibility _really_ justify the branch?


It's also a question of...  why keep adding modernizing features to 
ext3, thus keeping it on life support, but just barely?  If we are going 
to modernize the _main Linux filesystem_, let's not do it in a way that 
is slow, and ties our hands.

	Jeff



^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [Ext2-devel] [RFC 0/13] extents and 48bit ext3
  2006-06-09 16:27                 ` Jeff Garzik
@ 2006-06-09 16:48                   ` Alex Tomas
  0 siblings, 0 replies; 104+ messages in thread
From: Alex Tomas @ 2006-06-09 16:48 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Mike Snitzer, Alex Tomas, Christoph Hellwig, Mingming Cao,
	linux-kernel, ext2-devel, linux-fsdevel

>>>>> Jeff Garzik (JG) writes:

 JG> It's also a question of...  why keep adding modernizing features to
 JG> ext3, thus keeping it on life support, but just barely?  If we are
 JG> going to modernize the _main Linux filesystem_, let's not do it in a
 JG> way that is slow, and ties our hands.

I think trying to solve all problems at once will take much longer.

thanks, Alex

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC 0/13] extents and 48bit ext3
  2006-06-09 16:21               ` Mike Snitzer
  2006-06-09 16:27                 ` Jeff Garzik
@ 2006-06-09 16:33                 ` Alex Tomas
  2006-06-09 16:37                   ` [Ext2-devel] " Jeff Garzik
  2006-06-09 22:52                   ` Valdis.Kletnieks
  1 sibling, 2 replies; 104+ messages in thread
From: Alex Tomas @ 2006-06-09 16:33 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Jeff Garzik, ext2-devel, linux-kernel, Christoph Hellwig,
	Mingming Cao, linux-fsdevel, Alex Tomas

>>>>> Mike Snitzer (MS) writes:

 MS> precludes their ability to boot older kernels (steps can be taken to
 MS> make them well aware of this). The "real world situation" you refer
 MS> to, while hypothetically valid, isn't something informed
 MS> ext3-with-extents users will _ever_ elect to do.

one who needs/wants to go back may get rid of extents by:
a) remounting w/o extents option
b) copying new-fashion-style files so that copies use blockmap
c) dropping extents feature in superblock

thanks, Alex

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [Ext2-devel] [RFC 0/13] extents and 48bit ext3
  2006-06-09 16:33                 ` Alex Tomas
@ 2006-06-09 16:37                   ` Jeff Garzik
  2006-06-09 22:52                   ` Valdis.Kletnieks
  1 sibling, 0 replies; 104+ messages in thread
From: Jeff Garzik @ 2006-06-09 16:37 UTC (permalink / raw)
  To: Alex Tomas
  Cc: Mike Snitzer, Christoph Hellwig, Mingming Cao, linux-kernel,
	ext2-devel, linux-fsdevel

Alex Tomas wrote:
>>>>>> Mike Snitzer (MS) writes:
> 
>  MS> precludes their ability to boot older kernels (steps can be taken to
>  MS> make them well aware of this). The "real world situation" you refer
>  MS> to, while hypothetically valid, isn't something informed
>  MS> ext3-with-extents users will _ever_ elect to do.
> 
> one who needs/wants to go back may get rid of extents by:
> a) remounting w/o extents option
> b) copying new-fashion-style files so that copies use blockmap
> c) dropping extents feature in superblock

More likely, they will just backup+restore rather than go through all that.

After leafing through a 50-page manual to match up kernel versions with 
ext3 features, to see which older kernels will (or won't) require all 
this work.

	Jeff




^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [Ext2-devel] [RFC 0/13] extents and 48bit ext3
  2006-06-09 16:33                 ` Alex Tomas
  2006-06-09 16:37                   ` [Ext2-devel] " Jeff Garzik
@ 2006-06-09 22:52                   ` Valdis.Kletnieks
  2006-06-09 23:21                     ` Andreas Dilger
  1 sibling, 1 reply; 104+ messages in thread
From: Valdis.Kletnieks @ 2006-06-09 22:52 UTC (permalink / raw)
  To: Alex Tomas
  Cc: Mike Snitzer, Jeff Garzik, Christoph Hellwig, Mingming Cao,
	linux-kernel, ext2-devel, linux-fsdevel

[-- Attachment #1: Type: text/plain, Size: 565 bytes --]

On Fri, 09 Jun 2006 20:33:18 +0400, Alex Tomas said:

> one who needs/wants to go back may get rid of extents by:
> a) remounting w/o extents option
> b) copying new-fashion-style files so that copies use blockmap
> c) dropping extents feature in superblock

OK.. Obviously my brain is tiny and easily overfilled.

Given that the whole alledged problem with extents is that they're not
backward compatible, how do you read the files in (b) so that you can copy
them, if the data is in the non-compatible extents that you can't read because
you've disabled extents?

[-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --]

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [Ext2-devel] [RFC 0/13] extents and 48bit ext3
  2006-06-09 22:52                   ` Valdis.Kletnieks
@ 2006-06-09 23:21                     ` Andreas Dilger
  2006-06-10  1:21                       ` Valdis.Kletnieks
  0 siblings, 1 reply; 104+ messages in thread
From: Andreas Dilger @ 2006-06-09 23:21 UTC (permalink / raw)
  To: Valdis.Kletnieks
  Cc: Alex Tomas, Jeff Garzik, ext2-devel, linux-kernel,
	Christoph Hellwig, Mingming Cao, linux-fsdevel

On Jun 09, 2006  18:52 -0400, Valdis.Kletnieks@vt.edu wrote:
> On Fri, 09 Jun 2006 20:33:18 +0400, Alex Tomas said:
> > one who needs/wants to go back may get rid of extents by:
> > a) remounting w/o extents option
> > b) copying new-fashion-style files so that copies use blockmap
> > c) dropping extents feature in superblock
> 
> OK.. Obviously my brain is tiny and easily overfilled.

...

> Given that the whole alledged problem with extents is that they're not
> backward compatible, how do you read the files in (b) so that you can copy
> them, if the data is in the non-compatible extents that you can't read because
> you've disabled extents?

You mount with the new kernel without "-o extents", and find files with
extents "lsattr -R /mnt/tmp | awk '/----e / print { $2 }'", copy those
files, mv over old files, unmount.

A similar thing is necessary for ext3 filesystems before you can mount them
as ext2 - they can't be mounted as ext2 until the journal is recovered
(an unrecovered journal is an incompatible feature).

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC 0/13] extents and 48bit ext3
  2006-06-09 23:21                     ` Andreas Dilger
@ 2006-06-10  1:21                       ` Valdis.Kletnieks
  2006-06-10  2:09                         ` [Ext2-devel] " Andreas Dilger
  0 siblings, 1 reply; 104+ messages in thread
From: Valdis.Kletnieks @ 2006-06-10  1:21 UTC (permalink / raw)
  To: Andreas Dilger
  Cc: Jeff Garzik, ext2-devel, linux-kernel, Christoph Hellwig,
	Mingming Cao, linux-fsdevel, Alex Tomas

[-- Attachment #1.1: Type: text/plain, Size: 795 bytes --]

On Fri, 09 Jun 2006 17:21:08 MDT, Andreas Dilger said:

> You mount with the new kernel without "-o extents", and find files with
> extents "lsattr -R /mnt/tmp | awk '/----e / print { $2 }'", copy those
> files, mv over old files, unmount.

How do you "copy those files" when you don't have extent support at that
point?  Remember - the whole problem here is that if you don't have
extent support, you can't read the file, it's backward-incompatible.
(If you *are* able to read the file even without extents, then this whole
thread is total BS).

You can certainly at least try to copy them to another file system
while the source *is* mounted with -o extents, and then mount without it
and copy the files back, but (a) that isn't what you said and (b) it doesn't
work for files over 2T or so..

[-- Attachment #1.2: Type: application/pgp-signature, Size: 226 bytes --]

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]

[-- Attachment #3: Type: text/plain, Size: 161 bytes --]

_______________________________________________
Ext2-devel mailing list
Ext2-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ext2-devel

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [Ext2-devel] [RFC 0/13] extents and 48bit ext3
  2006-06-10  1:21                       ` Valdis.Kletnieks
@ 2006-06-10  2:09                         ` Andreas Dilger
  2006-06-10  2:45                           ` Nicholas Miell
  0 siblings, 1 reply; 104+ messages in thread
From: Andreas Dilger @ 2006-06-10  2:09 UTC (permalink / raw)
  To: Valdis.Kletnieks
  Cc: Alex Tomas, Jeff Garzik, ext2-devel, linux-kernel,
	Christoph Hellwig, Mingming Cao, linux-fsdevel

On Jun 09, 2006  21:21 -0400, Valdis.Kletnieks@vt.edu wrote:
> On Fri, 09 Jun 2006 17:21:08 MDT, Andreas Dilger said:
> > You mount with the new kernel without "-o extents", and find files with
> > extents "lsattr -R /mnt/tmp | awk '/----e / print { $2 }'", copy those
> > files, mv over old files, unmount.
> 
> How do you "copy those files" when you don't have extent support at that
> point?  Remember - the whole problem here is that if you don't have
> extent support, you can't read the file, it's backward-incompatible.
> (If you *are* able to read the file even without extents, then this whole
> thread is total BS).

The "-o extents" mount option only affects new files that are created
while that option is enabled.  It doesn't affect existing files (even if
they are modified while "-o extents" is set).  It also doesn't affect any
new files after "-o extents" is removed.  Also, directories will not
be extent-mapped, because their allocation pattern doesn't mix well with
extent-mapped files (i.e. they are mostly single-block allocations).

Files that are created with "-o extents" are of course only readable with
a kernel that supports it.  To be safe, the whole filesystem is marked
with an EXT3_FEATURE_INCOMPAT_EXTENTS flag when the first extent file
is created so that users don't inadvertently get strange errors while
accessing the inodes marked with EXT3_EXTENT_FL with an old kernel.
New kernels that understand INCOMPAT_EXTENTS of course can access extent
and non-extent files equally well.

In an emergency it would also be possible to remove the INCOMPAT_EXTENTS
filesystem flag and access all of the non-extent files, but this would
risk filesystem corruption if any of the extent files were modified or
unlinked, as that is the only indication older kernels have of this change.

So, to answer your question, if you _really_ want to get rid of extents
on a filesystem, you mount the filesystem with INCOMPAT_EXTENTS on a new
kernel that supports extents, but without -o extents so new files will
use the old block-map layout, so if "orig-file" is an extent-mapped file:

	cp /mnt/tmp/orig-file /mnt/tmp/temp-block-mapped-file
	mv /mnt/tmp/temp-block-mapped-file /mnt/tmp/orig-file

and now /mnt/tmp/orig-file is no longer extent-mapped.  Do this for all
the extent-mapped files, unmount, use "debugfs -w -R 'feature ^extents' {dev}"
and your filesystem is mountable with any old kernel.

No, it's not quite as easy as ext3 journal recovery->ext2 mounting,
but then again "-o extents" isn't something that happens automatically
(at least not for a couple of years, and hopefully distros will be smart
enough never to do this for filesystems like /boot or / that are critical
for mounting on a wide variety of kernels.  Besides which, we don't want
to have to teach GRUB about extent-mapped files.  Concievably, if this
becomes an issue then it should be possible to add a flag to inodes and
parent directories to add a "no extents" flag that is inherited by new
files that should never be extent mapped.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [Ext2-devel] [RFC 0/13] extents and 48bit ext3
  2006-06-10  2:09                         ` [Ext2-devel] " Andreas Dilger
@ 2006-06-10  2:45                           ` Nicholas Miell
  2006-06-10  4:29                             ` Andreas Dilger
  0 siblings, 1 reply; 104+ messages in thread
From: Nicholas Miell @ 2006-06-10  2:45 UTC (permalink / raw)
  To: Andreas Dilger
  Cc: Valdis.Kletnieks, Alex Tomas, Jeff Garzik, ext2-devel,
	linux-kernel, Christoph Hellwig, Mingming Cao, linux-fsdevel

On Fri, 2006-06-09 at 20:09 -0600, Andreas Dilger wrote:
> On Jun 09, 2006  21:21 -0400, Valdis.Kletnieks@vt.edu wrote:
> > On Fri, 09 Jun 2006 17:21:08 MDT, Andreas Dilger said:
> > > You mount with the new kernel without "-o extents", and find files with
> > > extents "lsattr -R /mnt/tmp | awk '/----e / print { $2 }'", copy those
> > > files, mv over old files, unmount.
> > 
> > How do you "copy those files" when you don't have extent support at that
> > point?  Remember - the whole problem here is that if you don't have
> > extent support, you can't read the file, it's backward-incompatible.
> > (If you *are* able to read the file even without extents, then this whole
> > thread is total BS).
> 
> The "-o extents" mount option only affects new files that are created
> while that option is enabled.  It doesn't affect existing files (even if
> they are modified while "-o extents" is set).  It also doesn't affect any
> new files after "-o extents" is removed.  Also, directories will not
> be extent-mapped, because their allocation pattern doesn't mix well with
> extent-mapped files (i.e. they are mostly single-block allocations).
> 
> Files that are created with "-o extents" are of course only readable with
> a kernel that supports it.  To be safe, the whole filesystem is marked
> with an EXT3_FEATURE_INCOMPAT_EXTENTS flag when the first extent file
> is created so that users don't inadvertently get strange errors while
> accessing the inodes marked with EXT3_EXTENT_FL with an old kernel.
> New kernels that understand INCOMPAT_EXTENTS of course can access extent
> and non-extent files equally well.
> 
> In an emergency it would also be possible to remove the INCOMPAT_EXTENTS
> filesystem flag and access all of the non-extent files, but this would
> risk filesystem corruption if any of the extent files were modified or
> unlinked, as that is the only indication older kernels have of this change.
> 
> So, to answer your question, if you _really_ want to get rid of extents
> on a filesystem, you mount the filesystem with INCOMPAT_EXTENTS on a new
> kernel that supports extents, but without -o extents so new files will
> use the old block-map layout, so if "orig-file" is an extent-mapped file:
> 
> 	cp /mnt/tmp/orig-file /mnt/tmp/temp-block-mapped-file
> 	mv /mnt/tmp/temp-block-mapped-file /mnt/tmp/orig-file
> 
> and now /mnt/tmp/orig-file is no longer extent-mapped.  Do this for all
> the extent-mapped files, unmount, use "debugfs -w -R 'feature ^extents' {dev}"
> and your filesystem is mountable with any old kernel.
> 
> No, it's not quite as easy as ext3 journal recovery->ext2 mounting,
> but then again "-o extents" isn't something that happens automatically
> (at least not for a couple of years, and hopefully distros will be smart
> enough never to do this for filesystems like /boot or / that are critical
> for mounting on a wide variety of kernels.  Besides which, we don't want
> to have to teach GRUB about extent-mapped files.  Concievably, if this
> becomes an issue then it should be possible to add a flag to inodes and
> parent directories to add a "no extents" flag that is inherited by new
> files that should never be extent mapped.
> 
> Cheers, Andreas
> --
> Andreas Dilger
> Principal Software Engineer
> Cluster File Systems, Inc.


I think changing all of this mess to:

[root@localhost root]# tune2fs -O extents /dev/whatever
WARNING: Enabling extents on /dev/whatever will make this filesystem
unreadable in Linux kernels versions before 2.6.19!
Are you sure you want to do this? <y/n>

[root@localhost root]# tune2fs -O ^extents /dev/whatever
WARNING: Disabling extents on /dev/whatever requires you to run e2fsck
on this filesystem before it can be used again!
Are you sure you want to do this? <y/n>

might assuage many of the fears presented in this thread.
-- 
Nicholas Miell <nmiell@comcast.net>


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [Ext2-devel] [RFC 0/13] extents and 48bit ext3
  2006-06-10  2:45                           ` Nicholas Miell
@ 2006-06-10  4:29                             ` Andreas Dilger
  0 siblings, 0 replies; 104+ messages in thread
From: Andreas Dilger @ 2006-06-10  4:29 UTC (permalink / raw)
  To: Nicholas Miell
  Cc: Valdis.Kletnieks, Alex Tomas, Jeff Garzik, ext2-devel,
	linux-kernel, Christoph Hellwig, Mingming Cao, linux-fsdevel

On Jun 09, 2006  19:45 -0700, Nicholas Miell wrote:
> I think changing all of this mess to:
> 
> [root@localhost root]# tune2fs -O extents /dev/whatever
> WARNING: Enabling extents on /dev/whatever will make this filesystem
> unreadable in Linux kernels versions before 2.6.19!
> Are you sure you want to do this? <y/n>
> 
> [root@localhost root]# tune2fs -O ^extents /dev/whatever
> WARNING: Disabling extents on /dev/whatever requires you to run e2fsck
> on this filesystem before it can be used again!
> Are you sure you want to do this? <y/n>
> 
> might assuage many of the fears presented in this thread.

If that were true, then I'd be happy to make this the barrier to entry.
Sadly, I don't think that is the only issue, but I'm happy to be shown
to be wrong.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [RFC 0/13] extents and 48bit ext3
  2006-06-09 15:17             ` [Ext2-devel] " Jeff Garzik
  2006-06-09 16:21               ` Mike Snitzer
@ 2006-06-09 16:56               ` Andreas Dilger
  2006-06-09 17:32                 ` [Ext2-devel] " Greg KH
  1 sibling, 1 reply; 104+ messages in thread
From: Andreas Dilger @ 2006-06-09 16:56 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: ext2-devel, linux-kernel, Christoph Hellwig, Mingming Cao,
	linux-fsdevel, Alex Tomas

On Jun 09, 2006  11:17 -0400, Jeff Garzik wrote:
> Not all users are big production houses that plan their filesystem 
> metadata migration months in advance!  I _guarantee_ some users will 
> boot into ext3-with-extents, use it for a while, and then try to 
> downgrade for whatever reason...  only to find they have been LOCKED 
> OUT.  That is a very real world situation, guys.

Except that the only way that they will get extents is if they read some
documentation that tells them to mount with "-o extents", which will also
say "this is incompatible with older kernels - only use it if you aren't
going to revert to older kernels".  If they try to mount such a filesystem
it will report "trying to mount filesystem with incompatible feature",
and "e2fsprogs" will report "incompatible feature extents - please upgrade
your e2fsprogs" (for versions newer than Nov 2004).

It's a lot better than e.g. the latest ubuntu which (apparently,
I read) can't mount a kernel older than 2.6.15 because of udev (or
sysfs?) changes.  It's better than e.g. reiserfs vs. reiser4 compatibility
(which doesn't exist).  2.4 kernels probably can't mount a new udev root
filesystem because none of the /dev files exist either.  2.4 kernels can't
mount a filesystem that is using device mapper ("LVM 2.0") instead of
"LVM 1.0".  All 2.2 kernel.org kernels couldn't use any system with RAID,
because any distro worth its salt had upgraded the RAID code to a working
(incompatible) version.

Nobody is forcing users to use extents.   Same with large inodes in ext3,
which give a 7x speedup in samba4 performance - did this cause you any
heartburn yet?   Large inodes + fast EAs are available for people who want
to use it for a couple of years already, will soon allow nanosecond times
and maybe one day in the distant future it will become the default but not
yet.  In a few years, the support for extents in ext3 will be pervasive
and most people won't care if they can boot to 2.4.10 or not, and if they
care about this they will also know enough not to enable extents.  The ext3
developers are a very cautious bunch, and don't force anything onto users.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [Ext2-devel] [RFC 0/13] extents and 48bit ext3
  2006-06-09 16:56               ` Andreas Dilger
@ 2006-06-09 17:32                 ` Greg KH
  0 siblings, 0 replies; 104+ messages in thread
From: Greg KH @ 2006-06-09 17:32 UTC (permalink / raw)
  To: Jeff Garzik, Alex Tomas, Christoph Hellwig, linux-fsdevel,
	ext2-devel, Mingming Cao, linux-kernel

On Fri, Jun 09, 2006 at 10:56:43AM -0600, Andreas Dilger wrote:
> It's a lot better than e.g. the latest ubuntu which (apparently,
> I read) can't mount a kernel older than 2.6.15 because of udev (or
> sysfs?) changes.

If this is true, then it's only because the Ubuntu developers do not
want to support older kernel versions.  Other distros handle this just
fine (Gentoo and Debian for example).  This is not a kernel issue, but
rather a distro design issue.

Which is much different from the fact that I take a "ext3" partition
from my new distro and can't get to the data if I downgrade to an older
distro for whatever reason (or use an older rescue disk.)

Don't confuse distro design decisions from issues forced on an unknowing
user by the ext3 fs kernel developers.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 104+ messages in thread

end of thread, other threads:[~2006-06-20  9:21 UTC | newest]

Thread overview: 104+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <6lDZ6-6Hg-11@gated-at.bofh.it>
     [not found] ` <6lFet-8vZ-1@gated-at.bofh.it>
     [not found]   ` <6lKHf-84q-19@gated-at.bofh.it>
     [not found]     ` <6lQMJ-sm-13@gated-at.bofh.it>
     [not found]       ` <6lR6f-Rx-33@gated-at.bofh.it>
     [not found]         ` <6lRpH-1h2-51@gated-at.bofh.it>
     [not found]           ` <6lRIT-1T1-23@gated-at.bofh.it>
     [not found]             ` <6lS26-2ho-7@gated-at.bofh.it>
     [not found]               ` <6lSvd-2Sh-15@gated-at.bofh.it>
     [not found]                 ` <6lTKz-4Ru-9@gated-at.bofh.it>
     [not found]                   ` <6lTUf-54A-17@gated-at.bofh.it>
     [not found]                     ` <6lU3S-5h5-11@gated-at.bofh.it>
     [not found]                       ` <6lU3X-5h5-35@gated-at.bofh.it>
     [not found]                         ` <6lUnl-5GL-5@gated-at.bofh.it>
     [not found]                           ` <6lUwX-66U-25@gated-at.bofh.it>
     [not found]                             ` <6lUQo-6w3-29@gated-at.bofh.it>
     [not found]                               ` <6lUQp-6w3-35@gated-at.bofh.it>
     [not found]                                 ` <6lUZT-6HS-3@gated-at.bofh.it>
     [not found]                                   ` <6nE4Z-4If-55@gated-at.bofh.it>
2006-06-14 16:45                                     ` [Ext2-devel] [RFC 0/13] extents and 48bit ext3 Bodo Eggert
2006-06-14 17:28                                       ` Andreas Dilger
2006-06-09 19:39 Gerrit Huizenga
2006-06-09 19:45 ` [Ext2-devel] " Jeff Garzik
  -- strict thread matches above, loose matches on Subject: below --
2006-06-09 18:55 Jeff Garzik
2006-06-09 19:42 ` [Ext2-devel] " Gerrit Huizenga
2006-06-09 20:00   ` Jeff Garzik
2006-06-09 20:08     ` Alex Tomas
2006-06-09 20:10       ` [Ext2-devel] " Jeff Garzik
2006-06-09 20:35     ` Theodore Tso
2006-06-09 21:41       ` Jeff Garzik
2006-06-09 21:45         ` [Ext2-devel] " Michael Poole
2006-06-09 21:53           ` Jeff Garzik
2006-06-09  1:20 Mingming Cao
2006-06-09  2:40 ` Valdis.Kletnieks
2006-06-09  8:20   ` Andreas Dilger
2006-06-09 18:35     ` [Ext2-devel] " Stephen C. Tweedie
2006-06-09 19:20       ` Jeff Garzik
2006-06-09 19:28         ` Alex Tomas
2006-06-09  2:49 ` Jeff Garzik
2006-06-09  8:35   ` Andreas Dilger
2006-06-09 15:08     ` Jeff Garzik
2006-06-09 15:25       ` Jeff Garzik
2006-06-09 15:40         ` Linus Torvalds
2006-06-09 15:47           ` Jeff Garzik
2006-06-09 15:55             ` Alex Tomas
2006-06-09 15:56               ` Jeff Garzik
2006-06-09 16:07                 ` Alex Tomas
2006-06-09 16:09                   ` [Ext2-devel] " Jeff Garzik
2006-06-09 18:04                   ` Matthew Frost
2006-06-09 18:14                     ` Andreas Dilger
2006-06-09 18:51                       ` Jeff Garzik
2006-06-09 19:49                         ` Theodore Tso
2006-06-09 20:04                           ` Jeff Garzik
2006-06-09 20:57                             ` Stephen C. Tweedie
2006-06-09 21:49                               ` Jeff Garzik
2006-06-09 21:55                                 ` [Ext2-devel] " Stephen C. Tweedie
2006-06-09 23:44                                   ` Jeff Garzik
2006-06-10  0:45                                     ` [Ext2-devel] " Andreas Dilger
2006-06-10  0:47                                     ` Theodore Tso
2006-06-10  1:09                                       ` Jeff Garzik
2006-06-10  1:30                                         ` [Ext2-devel] " Andreas Dilger
2006-06-10  1:43                                           ` Jeff Garzik
2006-06-10  2:03                                             ` Theodore Tso
2006-06-10  2:11                                               ` [Ext2-devel] " Jeff Garzik
2006-06-10  2:58                                               ` Jeff Garzik
2006-06-09 22:37                             ` Andreas Dilger
2006-06-11 16:02                         ` Arjan van de Ven
2006-06-11 16:30                           ` Nikita Danilov
2006-06-11 16:55                             ` [Ext2-devel] " Arjan van de Ven
2006-06-12 22:06                         ` Pavel Machek
2006-06-14 14:31                           ` Barry K. Nathan
2006-06-14 21:34                             ` [Ext2-devel] " Pavel Machek
2006-06-15  0:28                               ` Barry K. Nathan
2006-06-15  4:55                                 ` Theodore Tso
2006-06-15  7:43                                   ` Barry K. Nathan
2006-06-15  9:15                                 ` Pavel Machek
2006-06-15  9:40                                   ` Barry K. Nathan
2006-06-15  9:50                                     ` [Ext2-devel] " Pavel Machek
2006-06-09 20:52                 ` Stephen C. Tweedie
2006-06-09 21:47                   ` [Ext2-devel] " Jeff Garzik
2006-06-09 16:10           ` Alex Tomas
2006-06-09 16:10             ` Jeff Garzik
2006-06-09 16:24               ` Erik Mouw
2006-06-09 16:24               ` Chase Venters
2006-06-09 16:25               ` Alex Tomas
2006-06-09 16:28                 ` Jeff Garzik
2006-06-09 16:50                   ` Alex Tomas
2006-06-09 16:53                     ` [Ext2-devel] " Jeff Garzik
2006-06-09 17:01                       ` Alex Tomas
2006-06-09 17:10                         ` Jeff Garzik
2006-06-09 16:25             ` Linus Torvalds
2006-06-09 16:48               ` Alex Tomas
2006-06-09 16:55                 ` Jeff Garzik
2006-06-09 17:12                   ` [Ext2-devel] " Alex Tomas
2006-06-09 17:12                     ` Jeff Garzik
2006-06-09 19:57                   ` Theodore Tso
2006-06-09 20:09                     ` Jeff Garzik
2006-06-09 20:14                       ` Alex Tomas
2006-06-19  7:48                         ` [Ext2-devel] " Helge Hafting
2006-06-09 20:38                     ` Joel Becker
2006-06-09 20:50                       ` Dave Jones
2006-06-09 21:32                         ` [Ext2-devel] " Jeff Garzik
2006-06-09 22:56                           ` Andreas Dilger
2006-06-09 23:09                             ` Jeff Garzik
2006-06-09 23:37                               ` [Ext2-devel] " Andreas Dilger
2006-06-09 21:03                       ` Theodore Tso
2006-06-09 21:24                         ` Joel Becker
2006-06-09 21:36                           ` [Ext2-devel] " Chase Venters
2006-06-09 21:51                           ` Theodore Tso
2006-06-09 22:07                             ` Joel Becker
2006-06-09 22:31                               ` [Ext2-devel] " Theodore Tso
2006-06-09 22:47                                 ` Joel Becker
2006-06-09 23:54                                   ` [Ext2-devel] " Theodore Tso
2006-06-09 16:54               ` Linus Torvalds
2006-06-09 17:04                 ` Alex Tomas
2006-06-09 17:30                   ` [Ext2-devel] " Linus Torvalds
2006-06-09 17:41                     ` Matthew Wilcox
2006-06-09 18:04                       ` [Ext2-devel] " Linus Torvalds
2006-06-09 18:17                       ` Michael Poole
2006-06-09 18:10                 ` Andreas Dilger
2006-06-09 18:22                   ` Linus Torvalds
2006-06-09 18:30                     ` Alex Tomas
2006-06-09 18:38                       ` Linus Torvalds
2006-06-09 18:50                         ` [Ext2-devel] " Chase Venters
2006-06-09 19:00                           ` Chase Venters
2006-06-10 13:33                             ` Adrian Bunk
2006-06-09 19:01                           ` Jeff Garzik
2006-06-10 19:27                             ` Kyle Moffett
2006-06-10 19:44                               ` Linus Torvalds
2006-06-10 20:02                                 ` [Ext2-devel] " Linus Torvalds
2006-06-09 19:21                           ` Alan Cox
2006-06-09 19:13                             ` [Ext2-devel] " Chase Venters
2006-06-09 19:24                             ` Alex Tomas
2006-06-09 19:25                               ` Jeff Garzik
2006-06-09 19:35                                 ` Alex Tomas
2006-06-09 19:35                                   ` [Ext2-devel] " Jeff Garzik
2006-06-11 20:14                                   ` grundig
2006-06-09 18:43                       ` Jeff Garzik
2006-06-09 18:50                       ` Diego Calleja
2006-06-09 18:40                   ` Jeff Garzik
2006-06-09 18:59                     ` Andrew Morton
2006-06-09 19:16                       ` Jeff Garzik
2006-06-09 20:27                         ` [Ext2-devel] " Chase Venters
2006-06-09 20:44                       ` Alan Cox
2006-06-11 15:52                         ` [Ext2-devel] " Arjan van de Ven
2006-06-09 18:41                   ` Jeff Garzik
2006-06-09 17:12               ` Jeff Anderson-Lee
2006-06-09 18:02               ` Andrew Morton
2006-06-09 15:28       ` Alex Tomas
2006-06-09 15:44         ` Jeff Garzik
2006-06-09 15:53           ` Alex Tomas
2006-06-09 15:52             ` Jeff Garzik
2006-06-09 16:02               ` Alex Tomas
2006-06-09 16:04                 ` [Ext2-devel] " Jeff Garzik
2006-06-09 15:53         ` Gerrit Huizenga
2006-06-09 16:03           ` Jeff Garzik
2006-06-09 16:09           ` Linus Torvalds
2006-06-09 17:58             ` Gerrit Huizenga
2006-06-09 18:25               ` [Ext2-devel] " Chase Venters
2006-06-10 13:46               ` Adrian Bunk
2006-06-13 13:34               ` Helge Hafting
2006-06-09 20:32       ` Stephen C. Tweedie
2006-06-09 20:46         ` Linus Torvalds
2006-06-20  6:15           ` [Ext2-devel] " Qi Yong
2006-06-20  8:26             ` Laurent Vivier
2006-06-20  8:30               ` Jeff Garzik
2006-06-20  9:21                 ` Laurent Vivier
2006-06-09  9:13 ` Christoph Hellwig
2006-06-09 10:07   ` Andrew Morton
2006-06-09 15:40     ` Jeff Garzik
2006-06-09 16:56       ` Andrew Morton
2006-06-09 17:07         ` Jeff Garzik
2006-06-09 17:35           ` Andrew Morton
2006-06-09 17:48             ` Jeff Garzik
2006-06-09 17:59               ` Jeff Garzik
2006-06-09 18:27                 ` [Ext2-devel] " Mike Snitzer
2006-06-09 18:54                   ` Jeff Garzik
2006-06-09 19:22                     ` Alex Tomas
2006-06-09 22:49                       ` Valdis.Kletnieks
2006-06-09 23:34                         ` [Ext2-devel] " Andreas Dilger
2006-06-10 13:49                   ` Adrian Bunk
2006-06-10 13:51                     ` Christoph Hellwig
2006-06-10 14:54                       ` Jeff Garzik
2006-06-10 18:01                         ` [Ext2-devel] " Andreas Dilger
2006-06-09 11:26   ` Alex Tomas
2006-06-09 14:23     ` [Ext2-devel] " Jeff Garzik
2006-06-09 14:33       ` Alex Tomas
2006-06-09 14:34       ` Alex Tomas
2006-06-09 14:35         ` Jeff Garzik
2006-06-09 14:57           ` Alex Tomas
2006-06-09 15:17             ` [Ext2-devel] " Jeff Garzik
2006-06-09 16:21               ` Mike Snitzer
2006-06-09 16:27                 ` Jeff Garzik
2006-06-09 16:48                   ` Alex Tomas
2006-06-09 16:33                 ` Alex Tomas
2006-06-09 16:37                   ` [Ext2-devel] " Jeff Garzik
2006-06-09 22:52                   ` Valdis.Kletnieks
2006-06-09 23:21                     ` Andreas Dilger
2006-06-10  1:21                       ` Valdis.Kletnieks
2006-06-10  2:09                         ` [Ext2-devel] " Andreas Dilger
2006-06-10  2:45                           ` Nicholas Miell
2006-06-10  4:29                             ` Andreas Dilger
2006-06-09 16:56               ` Andreas Dilger
2006-06-09 17:32                 ` [Ext2-devel] " Greg KH

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).