From: Dave Chinner <david@fromorbit.com>
To: Chad Talbott <ctalbott@google.com>
Cc: linux-kernel@vger.kernel.org, Andrew Morton <akpm@google.com>,
Michael Rubin <mrubin@google.com>
Subject: Re: Metadata in sys_sync_file_range and fadvise(DONTNEED)
Date: Thu, 6 Nov 2008 15:20:54 +1100 [thread overview]
Message-ID: <20081106042054.GB2373@disturbed> (raw)
In-Reply-To: <1786ab030811051719w63ada9c8ldf75f15367adbb8b@mail.gmail.com>
On Wed, Nov 05, 2008 at 05:19:20PM -0800, Chad Talbott wrote:
> On Sun, Nov 2, 2008 at 2:45 PM, Dave Chinner <david@fromorbit.com> wrote:
> > On Fri, Oct 31, 2008 at 01:54:14PM -0700, Chad Talbott wrote:
> >> Andrew suggests a new SYNC_FILE_RANGE_METADATA flag for
> >> sys_sync_file_range(), and leaving posix_fadvise() alone.
> >
> > What is the interface that a filesystem will see? No filesystem has
> > a "metadata sync" method - is this going to fall through to some new
> > convoluted combination of writeback flags to an inode/mapping
> > that more filesystems than not can get wrong?
>
> Good point, coupled with metadata/data ordering and your argument
> below, a decent argument against exposing this interface.
>
> > FWIW, sys_sync_file_range() is fundamentally broken for data
> > integrity writeback - at no time does it call a filesystem method
> > that can result in a barrier I/O being issued to disk after
> > writeback is complete. So, unlike fsync() or fdatasync(), the data
> > can still be lost after completion due to power failure on drives
> > with volatile write caches....
>
> Seems to be true. I'm not currently concerned with sync_file_range
> for data integrity, so I'm going to punt on this issue.
;)
> If the consensus is against exposing a "sync metadata" interface, I'm
> fine with ext2 silently updating metadata alongside neighboring data
> in *either* posix_fadvise() or sync_file_range.
I think that sync_file_range is the better choice for "correct"
behaviour. There is the assumption with syncing data explicitly
that the metadata needs to reference that data is written to disk
as well.
> Either way, does it
> seem reasonable for posix_fadvise(DONTNEED) to call
> __filemap_fdatawrite_range to do its work?
>From a kernel perspective, I don't think it really matters. To an
application, it could. e.g. If you're calling posix_fadvise on a
large range, then the I/O patterns will be the same either way. If
you're calling posix_fadvise() on small, sparse ranges of the file,
then you'll turn one large, fast writeout into lots of small random
writes. i.e. upgrade the kernel and the application goes much
slower....
I guess this all depends on whether this would be considered a
regression or a stupid application ;)
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
prev parent reply other threads:[~2008-11-06 4:23 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-10-31 20:54 Metadata in sys_sync_file_range and fadvise(DONTNEED) Chad Talbott
2008-11-01 9:21 ` Andrew Morton
2008-11-06 0:56 ` Chad Talbott
2008-11-06 1:07 ` Andrew Morton
2008-11-06 1:27 ` Chad Talbott
2008-11-02 22:45 ` Dave Chinner
2008-11-06 1:19 ` Chad Talbott
2008-11-06 4:20 ` Dave Chinner [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20081106042054.GB2373@disturbed \
--to=david@fromorbit.com \
--cc=akpm@google.com \
--cc=ctalbott@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mrubin@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.