From: Jamie Lokier <jamie@shareable.org>
To: Jan Kara <jack@suse.cz>,
linux-fsdevel@vger.kernel.org, linux-ext4@vger.kernel.org,
Andrew Morton <akpm@linux-foundation.org>,
Theodore Tso <tytso@MIT.EDU>
Subject: Re: [RFC] [PATCH] vfs: Call filesystem callback when backing device caches should be flushed
Date: Wed, 21 Jan 2009 23:55:31 +0000 [thread overview]
Message-ID: <20090121235531.GB20407@shareable.org> (raw)
In-Reply-To: <20090121232524.GQ10158@disturbed>
Dave Chinner wrote:
> If the inode is dirty and fsync does nothing, then that filesystem
> is *broken*. If writing to the inode doesn't dirty it, then the
> filesystem is broken. Fix the broken filesystem.
*Wrong* Very, very wrong.
You do not write totally unchanged inode bytes just for the sake of
causing a NOP transaction to make the disk write the fsync as a
side-effect of a broken paradigm. That's _three_ pointless I/Os (one
redundant barrier and two writes), and probably 50x slowdown in write
performance due to seeking. Now who's filesystem is broken?
> > For efficient fdatasync() you _never_ want a transaction if possible,
> > because it forces the disk head to seek between alternating regions of
> > the disk, two seeks per fsync().
>
> If there is dirty metadata that is need to be logged or flushed,
> then fdatasync() needs to do something. If it doesn't do it
> correctly, then that *filesystem is broken*. Fix the broken
> filesystem.
A series of a writes over existing data and fdatasync() should *never*
write to the transaction log, unless you mounted something like ext3
data=journal, which isn't usual.
There is no dirty metadata to write. It is data only. fdatasync()
*means* "do NOT write metadata that is not needed for data retrieval",
that's it's whole point. A filesystem which keeps seeking to its
inode area _and_ its journal area _and_ the data area on every
fdatasync() is a poor design indeed.
> > So you can't rely on journalling transactions to flush.
>
> The VFS doesn't even know about transactions....
Whoever brought them up said they can be relied on to flush writes
during fsync/fdatasync. Just saying they can't, is all...
> > > Finally, I prefer maintainers of the filesystems themselves to
> > > decide whether their filesystem needs flushing and thus
> > > knowingly impose this performance penalty on them...
> >
> > I say it should flush be default unless a filesystem hooks an
> > alternative strategy. Certainly, it's silly to have the same code
> > duplicated in nearly every filesystem
>
> So write a *generic helper* for those filesystems that do the same
> thing and hook it to their ->fsync method. Don't hard code it in the
> VFS so other filesystem dev's have to come along afterwards and turn
> it off.
Are there any at the moment which would turn it off?
If so that's a fine idea.
-- Jamie
next prev parent reply other threads:[~2009-01-21 23:55 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-01-20 16:05 [RFC] [PATCH] vfs: Call filesystem callback when backing device caches should be flushed Jan Kara
2009-01-20 23:16 ` Joel Becker
2009-01-21 0:16 ` Jamie Lokier
2009-01-21 15:05 ` Jan Kara
2009-01-21 21:41 ` Jamie Lokier
2009-01-21 12:55 ` Jan Kara
2009-01-21 21:47 ` Jamie Lokier
2009-01-21 21:50 ` Jamie Lokier
2009-01-21 23:25 ` Dave Chinner
2009-01-21 23:55 ` Jamie Lokier [this message]
2009-01-22 1:21 ` Dave Chinner
2009-01-22 3:03 ` Jamie Lokier
2009-01-21 22:03 ` Joel Becker
2009-01-21 22:35 ` Jamie Lokier
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090121235531.GB20407@shareable.org \
--to=jamie@shareable.org \
--cc=akpm@linux-foundation.org \
--cc=jack@suse.cz \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=tytso@MIT.EDU \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).