Same magic in statfs() call for ext?

linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Same magic in statfs() call for ext?
@ 2009-03-16 13:36 Jan Kara
  2009-03-16 16:13 ` Eric Sandeen
  0 siblings, 1 reply; 4+ messages in thread
From: Jan Kara @ 2009-03-16 13:36 UTC (permalink / raw)
  To: linux-ext4

  Hi,

  I've just noticed that EXT2_SUPER_MAGIC == EXT3_SUPER_MAGIC ==
EXT4_SUPER_MAGIC. That is just fine for the disk format but as a result we
also return the same magic in statfs() syscall and thus a simple
application has hard time recognizing whether it works on ext2, ext3 or
ext4 (it would have to parse /proc/mounts and that is non-trivial if not
impossible when it comes to bind mounts). So should not we return different
magic numbers depending on how the filesystem is currently mounted?
  Now you may ask why should the application care - and I agree that in the
ideal world it should not. But for example there's a thread on GTK mailing
list [1] where they discuss the problem that with delayed allocation and
ext4, user can easily lose his data after crash (Ted wrote about it here in
some other mail some time ago). So they would like to call fsync() after
the file is written but on ext3 that is quite heavy and because of autosave
saving happens quite often. So they'd do fsync() only if the filesystem
is mounted as ext4...
  So I'm writing here so hear some opinions on returning different magic
numbers from statfs().

								Honza

[1] http://mail.gnome.org/archives/gtk-devel-list/2009-March/msg00082.html
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Same magic in statfs() call for ext?
  2009-03-16 13:36 Same magic in statfs() call for ext? Jan Kara
@ 2009-03-16 16:13 ` Eric Sandeen
  2009-03-16 16:27   ` Jan Kara
  0 siblings, 1 reply; 4+ messages in thread
From: Eric Sandeen @ 2009-03-16 16:13 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-ext4

Jan Kara wrote:
>   Hi,
> 
>   I've just noticed that EXT2_SUPER_MAGIC == EXT3_SUPER_MAGIC ==
> EXT4_SUPER_MAGIC. 

Just noticed?  *grin*

> That is just fine for the disk format but as a result we
> also return the same magic in statfs() syscall and thus a simple
> application has hard time recognizing whether it works on ext2, ext3 or
> ext4 (it would have to parse /proc/mounts and that is non-trivial if not
> impossible when it comes to bind mounts). 

I have a guess as to why they want to know, and ...

> So should not we return different
> magic numbers depending on how the filesystem is currently mounted?
>   Now you may ask why should the application care - and I agree that in the
> ideal world it should not. But for example there's a thread on GTK mailing
> list [1] where they discuss the problem that with delayed allocation and
> ext4, user can easily lose his data after crash 

... sadly I was right.  :)

> (Ted wrote about it here in
> some other mail some time ago). So they would like to call fsync() after
> the file is written but on ext3 that is quite heavy and because of autosave
> saving happens quite often. So they'd do fsync() only if the filesystem
> is mounted as ext4...
>   So I'm writing here so hear some opinions on returning different magic
> numbers from statfs().
> 
> 								Honza
> 
> [1] http://mail.gnome.org/archives/gtk-devel-list/2009-March/msg00082.html

As an aside, Ted also pointed out that ext4-without-delalloc also hurts
on fsync just like ext3 does, so testing "ext3 vs. ext4" isn't quite
enough in general.

I have been a bit dismayed that app writers just want the old ext3
behavior (which still has a window for loss, doesn't it?) so that they
can get away without fsyncing.  And talking to KDE folks and others, I
think that if ext3 didn't hurt so much w/ fsync, they would just happily
do the right posix-defined thing and add fsync() when needed.

But instead, since they are now justifiably afraid of fsync, we are in
this quandary.  (maybe this is over-simplifying a bit).

But off the top of my head, I think that I would prefer to see
applications generally do the right, posix-conformant thing w.r.t. data
integrity (i.e. fsync()) unless, via statfs, they find out "fsync hurts,
and we're likely to be reasoonably safe without it"

IOW, adding exceptions for ext3 sounds better to me than munging ext4,
xfs, btrfs, and all future filesystems to conform to some behavior which
isn't in any API or spec ...

-Eric

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Same magic in statfs() call for ext?
  2009-03-16 16:13 ` Eric Sandeen
@ 2009-03-16 16:27   ` Jan Kara
  2009-03-30 18:23     ` Andreas Dilger
  0 siblings, 1 reply; 4+ messages in thread
From: Jan Kara @ 2009-03-16 16:27 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: linux-ext4

On Mon 16-03-09 11:13:13, Eric Sandeen wrote:
> Jan Kara wrote:
> >   Hi,
> > 
> >   I've just noticed that EXT2_SUPER_MAGIC == EXT3_SUPER_MAGIC ==
> > EXT4_SUPER_MAGIC. 
> Just noticed?  *grin*
  ;-)

> > That is just fine for the disk format but as a result we
> > also return the same magic in statfs() syscall and thus a simple
> > application has hard time recognizing whether it works on ext2, ext3 or
> > ext4 (it would have to parse /proc/mounts and that is non-trivial if not
> > impossible when it comes to bind mounts). 
> 
> I have a guess as to why they want to know, and ...
> 
> > So should not we return different
> > magic numbers depending on how the filesystem is currently mounted?
> >   Now you may ask why should the application care - and I agree that in the
> > ideal world it should not. But for example there's a thread on GTK mailing
> > list [1] where they discuss the problem that with delayed allocation and
> > ext4, user can easily lose his data after crash 
> 
> ... sadly I was right.  :)
> 
> > (Ted wrote about it here in
> > some other mail some time ago). So they would like to call fsync() after
> > the file is written but on ext3 that is quite heavy and because of autosave
> > saving happens quite often. So they'd do fsync() only if the filesystem
> > is mounted as ext4...
> >   So I'm writing here so hear some opinions on returning different magic
> > numbers from statfs().
> > 
> > 								Honza
> > 
> > [1] http://mail.gnome.org/archives/gtk-devel-list/2009-March/msg00082.html
> 
> As an aside, Ted also pointed out that ext4-without-delalloc also hurts
> on fsync just like ext3 does, so testing "ext3 vs. ext4" isn't quite
> enough in general.
  Yes, I know but it's at least some approximation.

> I have been a bit dismayed that app writers just want the old ext3
> behavior (which still has a window for loss, doesn't it?) so that they
> can get away without fsyncing.  And talking to KDE folks and others, I
> think that if ext3 didn't hurt so much w/ fsync, they would just happily
> do the right posix-defined thing and add fsync() when needed.
> 
> But instead, since they are now justifiably afraid of fsync, we are in
> this quandary.  (maybe this is over-simplifying a bit).
> 
> But off the top of my head, I think that I would prefer to see
> applications generally do the right, posix-conformant thing w.r.t. data
> integrity (i.e. fsync()) unless, via statfs, they find out "fsync hurts,
> and we're likely to be reasoonably safe without it"
> 
> IOW, adding exceptions for ext3 sounds better to me than munging ext4,
> xfs, btrfs, and all future filesystems to conform to some behavior which
> isn't in any API or spec ...
  Yes, I agree that if they want data on disk, they should use fsync(). But
as you say for ext3 this is not really usable so they have to somehow
recognize that "they are on a filesystem where fsync() sucks" and avoid it
as much as possible. And I feel slightly in favor of giving them enough rope
(i.e., different magic numbers in statfs) to hang themselves ;-).

									Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Same magic in statfs() call for ext?
  2009-03-16 16:27   ` Jan Kara
@ 2009-03-30 18:23     ` Andreas Dilger
  0 siblings, 0 replies; 4+ messages in thread
From: Andreas Dilger @ 2009-03-30 18:23 UTC (permalink / raw)
  To: Jan Kara; +Cc: Eric Sandeen, linux-ext4

On Mar 16, 2009  17:27 +0100, Jan Kara wrote:
> On Mon 16-03-09 11:13:13, Eric Sandeen wrote:
> > But off the top of my head, I think that I would prefer to see
> > applications generally do the right, posix-conformant thing w.r.t. data
> > integrity (i.e. fsync()) unless, via statfs, they find out "fsync hurts,
> > and we're likely to be reasoonably safe without it"
> > 
> > IOW, adding exceptions for ext3 sounds better to me than munging ext4,
> > xfs, btrfs, and all future filesystems to conform to some behavior which
> > isn't in any API or spec ...
>
>   Yes, I agree that if they want data on disk, they should use fsync(). But
> as you say for ext3 this is not really usable so they have to somehow
> recognize that "they are on a filesystem where fsync() sucks" and avoid it
> as much as possible. And I feel slightly in favor of giving them enough rope
> (i.e., different magic numbers in statfs) to hang themselves ;-).

One possibility that I've thought of in the past is to have "dynamic
data=journal" mode when fsync is being called and files are small.
What this means is that small file data will be written to the journal
on fsync instead of journaling only the metadata and flushing the data
to the filesystem in ordered mode.

While it means data is written twice to disk (once to journal, once to
fs), if there is a lot of fsync going on and the files are small then
it may still be faster than doing the seeks.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2009-03-30 18:24 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-03-16 13:36 Same magic in statfs() call for ext? Jan Kara
2009-03-16 16:13 ` Eric Sandeen
2009-03-16 16:27   ` Jan Kara
2009-03-30 18:23     ` Andreas Dilger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).