public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: "Theodore Ts'o" <tytso@mit.edu>
To: Takashi Sato <sho@bsd.tnes.nec.co.jp>,
	cmm@us.ibm.com, linux-kernel@vger.kernel.org,
	ext2-devel@lists.sourceforge.net,
	Laurent Vivier <Laurent.Vivier@bull.net>
Subject: Re: [Ext2-devel] [PATCH 1/2] ext2/3: Support 2^32-1 blocks(Kernel)
Date: Sat, 18 Mar 2006 12:07:29 -0500	[thread overview]
Message-ID: <20060318170729.GI21232@thunk.org> (raw)
In-Reply-To: <20060316225913.GV30801@schatzie.adilger.int>

On Thu, Mar 16, 2006 at 03:59:13PM -0700, Andreas Dilger wrote:
> While I agree with that in theory, in practise we never end up doing
> this and it just ends up delaying the acceptance of the trivial patches.
> It may also be a burden later on when some of the features that could
> be e.g. ROCOMPAT are bundled with an INCOMPAT change and we then
> make the filesystem gratuitously INCOMPAT.
> 
> In the end, I don't think having a couple of separate flags is any more
> effort than having a single one.  As yet we only have about 6 of each 32
> feature bits used, and if we get close to running out we can make an
> EXT3_FEATURE_{,RO,IN}COMPAT_NEXT_WORD flag to continue it on.

The overhead is not running out of feature bit flags.  After all, it's
easy to add more if we need to; we just define the MSB as meaning
"check the auxiliary features compat/rocompat/incompat mask", and then
define a new 32-bit extension bitmask in the superblock.

What I'm trying to simplify is the overhead of users trying to
understand a tangled mess of features, some compat, some incompat,
etc.

> Note that I'm not against this in practise, but I wouldn't hold up any
> feature for this reason.  How long has large i_blocks been pending,
> and usecond timestamps?  Many years already, even though they are trivial
> to implement, so I'm hesitant to tie them together and delay further.

i_blocks has been pending because the people who could push it haven't
had the time, and usec timestamps because the trivial way (without an
at least an ROCOMPAT flag).

> I think i_blocks can be considered an ROCOMPAT feature, and the large
> inode reservation for usecond timestamps could be COMPAT I think (since
> an unsupporting kernel would still update all the timestamps consistently
> even if the useconds on disk would be some constant instead of 0.

i_blocks can ROCOMPAT only if it is acceptable for stat(2) to return
erroneous i_blocks return values.  I'm not entirely convinced that's a
good thing, and at the very least it would be extremely confusing, but
maybe.

usecond timestamps must be at least ROCOMPAT, because of the
requirement that all newly created inodes must reserve extra space and
guarantee that i_extra_isize must be at least n bytes (where n is the
size of the guaranteed extra inode fields).  If you don't do that,
then when the filesystem is mounted one again on a kernel that does
understand usec timestamps, some inodes will have room for the usec
time fields, and other inodes won't (because they have too much of the
space used for EA's), and that will cause serious problems for make(1).

> I think testing with reiserfs showed that tail packing was a net loss in
> most cases, since basically every benchmark I've ever seen with reiserfs
> disables tail packing or suffers.  For space constrained systems (if
> there ever exists such a thing again ;-) it would probably be better to
> go to compressed files.

I have to wonder if that's because of the way reiserfs implemented
tail-packing more than anything else.  I don't belive fragments hurt
performance on UFS systems quite as much as it does on reiserfs
systems.  I'm not worried about this as much for space constrained
systems, but for cases where we find that increasing the blocksize to
8k or even larger (32k?  64k) really helps, but we don't want to pay
the internal fragmentation penalty for small files.  There are other
ways to solve the problem, yes, such as by assuming that we can use a
different filesystem for database or video streams separate from the
/, /usr, and/or /var filesystems, for example.  

If we are ready to forever forswear wanting to use large block sizes,
then maybe we don't need to worry about fragmentations support (or
maybe the 1.8" pedabyte disk drives will show up and be cheap enough
that we just won't care about wasting space on small files).  But
that's I think a decision which we need to formally make.

					- Ted

  reply	other threads:[~2006-03-18 17:07 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-03-15 12:39 [PATCH 1/2] ext2/3: Support 2^32-1 blocks(Kernel) Takashi Sato
2006-03-15 12:56 ` [Ext2-devel] " Laurent Vivier
2006-03-16  2:19 ` Mingming Cao
2006-03-16 12:11   ` Takashi Sato
2006-03-16 13:53     ` Theodore Ts'o
2006-03-16 18:35     ` Andreas Dilger
2006-03-16 21:26       ` Theodore Ts'o
2006-03-16 22:59         ` Andreas Dilger
2006-03-18 17:07           ` Theodore Ts'o [this message]
2006-03-20  6:36             ` Andreas Dilger
2006-03-20 22:38               ` Stephen C. Tweedie
2006-03-20 23:48                 ` Andreas Dilger
2006-03-21 17:05                   ` Stephen C. Tweedie
2006-03-21 18:38                     ` Theodore Ts'o
2006-03-21 19:47                       ` Stephen C. Tweedie
2006-03-21 20:40                         ` Andreas Dilger
2006-03-21 20:16                       ` Alfred M. Szmidt
2006-03-21 23:05                       ` Olivier Galibert
2006-03-21 23:35                         ` Alfred M. Szmidt
2006-03-25 14:51                       ` cascardo
2006-03-26 16:27                         ` Andreas Dilger
2006-03-27 19:59                           ` Stephen C. Tweedie
2006-03-27 20:36                           ` Alfred M. Szmidt
2006-03-27 19:55                         ` Stephen C. Tweedie
2006-03-27 20:05                           ` Alfred M. Szmidt
2006-03-27 20:40                             ` Stephen C. Tweedie
2006-03-28  0:14                           ` cascardo
2006-03-21 20:26                     ` Andreas Dilger
2006-03-21  4:03                 ` Theodore Ts'o
2006-03-17  9:35       ` Laurent Vivier
2006-03-19  2:20 ` Theodore Ts'o
2006-03-20 10:11   ` Takashi Sato
2006-03-26  3:01     ` Theodore Ts'o
  -- strict thread matches above, loose matches on Subject: below --
2006-03-26 22:15 Chuck Ebbert

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20060318170729.GI21232@thunk.org \
    --to=tytso@mit.edu \
    --cc=Laurent.Vivier@bull.net \
    --cc=cmm@us.ibm.com \
    --cc=ext2-devel@lists.sourceforge.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=sho@bsd.tnes.nec.co.jp \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox